June 25-26, 2005
Phontisterion 6 operated a little differently. The first day was devoted to presentations of the two primary technologies extant for interactive storytelling: Erasmatron 4 (primarily Deikto), and Facade.
I launched Saturday morning with a discussion of Deikto, the toy language that forms the basis for Erasmatron 4. My first axiom was a variation on the Sapir-Whorf hypothesis that there exists a 1:1 mapping of words to concepts in a person’s perceptual universe. My variation represents a weak version of the Sapir-Whorf hypothesis and does not concern itself with thinking styles; its focus is on differentiation of individual concepts, not cognition itself.
My second axiom is that a storyworld, game, or simulation creates a subset of the author’s perceptual universe that concentrates attention on some narrow set of ideas that the author wishes to communicate to the audience. I argue that an image needs shadow as much as light; that adding more 1s to a binary string doesn’t add to its information content because 1s without 0s are meaningless. Subtraction is just as much a part of expression as addition, and this process of subtraction always yields an imaginary universe that is a subset of the full perceptual universe.
Put these two axioms together and you get a diagram like this:
This is my neologism for the knowledge of how to build toy languages. Sounds really high-tech, doesn’t it? That should make it more palatable to those who object to the term toy languages. I dragged the attendees through a long, boring history of writing, going from pictographic languages to logomorphic languages to syllabic languages and finally to alphabetic languages. My basic point was that the logomorphic systems are not intrinsically inferior to alphabetic writing systems; they are more practical for written languages with small vocabularies of a few hundred words. For toy languages with vocabularies of a thousand words or more, we surely want to use alphabetic language systems, but this leads us into a dilemma: the only alphabetic systems that people know are natural language systems. Such systems bring along with them the tens of thousands of words in the average user’s written vocabulary -- far too many. The dilemma we face, then, is this: how do we get a good written language whose vocabularly size is in that awkward middle ground between the few hundred that’s ideal for a logographic language and the many thousand that is ideal for an alphabetic language?
My solution is to create a hybrid toy language that is logographic in operation but alphabetic in presentation. Deikto uses words taken from the English vocabulary, spelled in English, but they are taken as complete, indivisible morphemes. Deikto can have cat but not cit; only words that I define are included in the Deikto dictionary.
Deikto is defined by its dictionary. Indeed, in Erasmatron 4, the dictionary is the storyworld. This is Sapir-Whorf with a vengeance: the dictionary defines the universe. Every word in the dictionary includes all the algorithms and data structures that define its operation in the storyworld. Language and universe are one and the same.
I had much more to say, but then, that is always the case...
Saturday afternoon saw Michael Mateas and Andrew Stern presenting Facade, their interactive storyworld. I maintain that this is the first genuine interactive storyworld of consequence, and thus deserves an important place in the history books. Michael and Andrew are too busy with shipping out Facade to write up something special for this page, but they have already prepared lots of material on the subject; they recommend the following links:
Facade: An Experiment in Building a Fully-Realized Interactive Drama (2003)a good overview paper, including our design goals, tour of the architecture, some analysis of failures and successes
Structuring Content in the Facade Interactive Drama Architecture (2005)
talks in some detail about story structure, alludes to the authoring process
A Behavior Language: Joint Action and Behavioral Idioms (2004)
talks about the behavior language ABL
Natural Language Understanding in Facade: Surface-text Processing (2004)talks about the NLP technology
Grand Text Auto essays (2003-5)a variety of essays on agency, AI, interactive drama, etc.
(Their own report on Phrontisterion 6 can be found at Grand Text Auto, their most excellent blog.)
Sunday was devoted to a deeper discussion of the some of the questions that arose during Saturday’s discussions.
Continuous vs discrete time
This question arose from one difference between Facade and Erasmatron: Erasmatron proceeds turn by turn, where Facade operates without turns. (Actually, Facade’s structure is more sophisticated than this: its presentation is continuous while its internal operation is discrete). While there seemed to be agreement that stories proceed event by event, and therefore have some intrinsic quantization, the question, as Michael put it, was whether an interactive storyworld should wear its quantization on its sleeve. I shall simply summarize some of the main points raised [Phronties, feel free to contribute expansions of any of these bullet points]:
- Continuous time puts time pressure on the player
- The question might better be framed as "quantized time vs punctuated continuity"
- Conversation is better when taken in turns, worse when continuous.
- The conversational metaphor for interactivity suggests that interactivity is intrinsicially discrete.
- Good sex is continuous, not discrete, and constitutes good interactivity.
- Quantization is a mental construct, continuity is more realistic.
- Quantization is better for cognitive processes and continuity better for physical processes.
- Continuity is harder to build, quantization is easier.
- Turn-based play more gracefully handles external interruptions than continuous play.
- Even a continuous system is discrete at some time granularity (milliseconds, seconds, minutes, days, etc).
- The fundamental unit of interaction is the event.
After some discussion, the group endorsed the following mealy-mouthed digestion that I prepared:
The distinction between discrete and continuous time is not fundamental. All interactive storytelling systems are to some degree discrete. Continuity is more reactive and allows application of time pressure. Discreteness supports a more reflective style of play.
Continuous versus discrete space
This concerns the question of how space should be organized in a storyworld. I have long argued that the continuous space used in games is inappropriate to dramatic presentations, which have always been presented with discrete space: stages. There was general agreement with my position. Here are a few basic points made during this short discussion:
- Continuous space is an unnecessary device.
- Cartesian coordinates are never part of a story.
- Continuous space might be integrated into the presentation of the story.
- Emphasis on continuous space arises from the availability of good tools (the hammer effect).
"Personality cannot readily be expressed through application of personality models."
This was the most contentious issue raised during the weekend. Michael expressed his skepticism that a few dozen personality parameters can offer sufficient differentiation between actors to permit them to take revealingly different courses of action. As he succinctly put it, "Generic decision-making leads to generic behavior."
Chris defended personality modelling with several arguments. First, character is not revealed through a single decision, but through a body of decisions. Thus, while a single inclination equation might yield identical results for both Snidely Whiplash and Princess Leia, over the course of time the differences between the two characters will manifest themselves through different inclination equations in differing decisions.
Chris’ second argument is that actor-specific decision-making mechanisms are too burdensome for storyworlds including large numbers of actors. If a storyworld has 500 verbs and but two NPCs, then a mere 1,000 cases must be considered; but the same storyworld with 20 NPCs will require 10,000 instances to be provided for. The abstraction provided by inclination equations based on personality models allows each verb to cover an unlimited number of actors.
This issue of abstraction generated interesting discussion. Chris argued that the highest level of abstraction is the ideal towards which we should strive; Michael argued for layered abstraction, with higher levels of abstraction for some problems and lower levels for other problems. Chris acknowledged the necessity of Michael’s approach for the immediate future.
Laura pointed out that the Erasmatron offered the layered approach Michael aimed for with its role system. It is possible to build roles customized to just one actor, or just a few. And with this note of compromise, we wandered on to the next topic.
Logographic toy language vs the "mellow parser"
Facade’s parser maps a great many expressions down to a small number of basic concepts; Chris dubbed this a "mellow parser" because it gracefully accepts and responds to a great variety of expressions. We therefore hurled Chris’ Deikto against Michael and Andrew’s mellow parser to see how they bounced off each other. Herewith some of the points made:
- "Agency demands the logomorphic approach."
- "Mellow parser gives the illusion of freedom."
- "Yes, but it’s a misleading illusion quickly exposed."
- "What if Facade displayed its interpretation of the player’s input?"
- "We must take baby steps before we can do real language."
- "What is the difference between illusion and deception?"
- "A mellow parser has plenty of expansion capability."
- "A good storyworld should present the user with a maximum of 10 verbs at each juncture."
The group was able to agree on this broad point:
The player should have as much freedom as whose consequences the artist can support.
Few universal verbs vs many contextually constrained verbs
Here we came to another interesting disagreement. One school of thought held that games are unlike storyworlds in that games require a quick learning curve and therefore should provide a small set of verbs that are universally available, whereas storyworlds should provide a wide range of verbs, each of which is available only in contextually appropriate situations. The opposing school felt that the philosophy of universally available verbs was just as applicable to storyworlds as to games. Its argument was the contextual restrictions on verbs made the player feel manipulated. It was also pointed out that audience expectations drive this question, and so game players will expect universally available verbs, while new players will have no such expectations. The issue was most sharply put with the challenge-question, "Should the player be able to kiss Darth Maul?" The two schools gave opposite answers to this question, and there we left the issue.
Games vs interactive storytelling
We next took up the question of the relationship between interactive storytelling and the world of games. While somebody had previously argued that the similarity of the products suggested a similarity of markets, Chris suggested that interactive storytelling is to games as wine is to beer: similar in many ways, but appealing to very different markets. Another person observed that the technologies of interactive storytelling are so different from the technologies of games that two very different sets of talents are required, further distancing interactive storytelling from games. Another countered that interactive storytelling technology could be integrated into games, but several people objected that such integration would never work, as the dramatic elements would be subordinated to gameplay issues and the storytelling aspect would thereby be destroyed. Another point was that, if we attempt to reach gamers, they will revert to their game-playing behaviors, to the detriment of the storytelling experience. Gamers don’t value story and bring the wrong expectations to the experience. Yet another person observed that games have a reputation for tawdriness and we don’t want to be associated with games. But another person asked, "Can’t we sell to both groups?" The group was divided on that question.
We were able to reach true consensus on a key point. When asked to vote on the question of whether interactive storytelling should proceed on a revolutionary basis on an evolutionary basis, the group was unanimously in favor of the revolutionary approach.
Stylized graphics vs photorealism
The last question the group took up concerned the ideal style of graphics. Should we follow the drive towards photorealism that the games industry is pursuing? The group agreed that we should reject photorealism in favor of stylized graphics. When asked whether the stylized graphics used in Facade are good enough to get the industry started, the group answered with a unanimous affirmative.
Commentary from Rick Smith
In the past, I have written long reports but this time I will be very brief. As for Phont. 6 it was great. None of this nambsy-pansy theoretical stuff, we had an actual product to poke at! In my critiques of Facade below, please remember that this is from someone who can’t do what they have done, discussing the work of his betters.
In this drama, you visit the house of two friends who’s marriage is close to breaking up. You talk to them in English (typing short sentences) and they respond to you using voice acted speech. Depending on what you say, you can be kicked out, one of them can walk out of the marriage, things can stay pretty much the same after an uncomfortable evening or they can start communicating better with the hope that they can work thru their problems. The characters show emotion by both what they say and by the expressions on their faces.
First impression was WOW! The expressions on the faces of the characters was really subtle. The interactive drama attempts to allow you to speak in real language in real time to the characters. Tho they do not react correctly to everything you say, often the characters are able to piece together if you are trying to be helpful or not by the end of the drama. This is very close to something that the public would pay real money for.
A number of things I don’t like about Facade are concentrated in the user interface and were deliberate design decisions by Andrew and Michael.
I found that the drama was paced to the rate of typing of the creators, so that there were several times when I lost my turn to speak by typing too slow. This usually happened if I made a spelling error while trying to type in a sentence quickly.
They actually slow things down if you are typing when things are calm, but if the two start fighting (and they will) the pacing picks up and you have to type fast to get a word in edge wise.
A pause button to allow me to answer the phone (or fix a misspelling or reword my sentence so it fits into the 25 characters allowed) would be nice. However this spoils the immediacy that they are trying for.
Given that they are going to much effort to NOT have a ’computer like’ user interface, typing your sentences is more awkward than simply speaking into a microphone. Of course recognizing speech would add another layer of misunderstandings to what the NPC’s can pick up about what you are trying to say. It is clear to me that they made the right choice for Facade, but a verbal language interface would be a natural expansion to their ideas if they had a lot more resources to attack the current problems.
When I build a user interface I am very clear what is possible for the user to do at any time and make sure that I give feedback on if the user input is successful. In trying to allow for a natural English ui, and by making the drama continue even if they NPC’s don’t understand your input these guidelines are violated.
Andrew very strongly stated that he did not like prompting people with allowable verbs. This would ’break’ the illusion of the drama and makes it harder to surprise the player. If the drama can understand 20 verbs, then all 20 should be available to the ui at all times, even if sometimes the verb is simply ignored. The upside to this design decision is clear: the drama is the closest that we have ever seen to a computer drama where players can walk in and take part naturally. The disadvantage is that there are many things that the player can type that should make sense but the NPC’s can’t understand.
I often found trying to communicate frustrating. I would say a reasonable, calming sentence and they would fly off the handle. (The fact that both characters are emotionally distraught is a deliberate explanation as to why they often misinterpret what you are saying. They are so wrapped up in themselves, they are not listening to you very well.)
But I was trying to play the role of a character who is well meaning but a little dim. I was unable to communicate this to the engine, they didn’t listen to me well enough to pick up this level of subtle interaction. I felt that the only roles I could play were marriage counselor and shit disturber.
I would rather have a less natural form of user input that was understood, than the current system that claims that any input is acceptable but does not understand most of what is said. But I would like to emphasize, even if their natural language parser did not understand everything I said, after several sentences by me, 4 times out of 5 it did figure out in broad terms what role (helpful person on shit disturber) that I was playing.
Another concern is how easy would this be to expand to more NPC’s and larger dramas? The drama as it is currently built hides a number of weaknesses. Only 2 NPC’s. They are upset so they don’t listen closely to what you say. Key words and thoughts push ’buttons’ that set them off. Most of the communication is about them and their history. The fiction is very short, etc.
However, as you add more NPC’s the number of things that you can talk about goes up steeply. You need to start discussing relationships BETWEEN NPC’s and the player character(s). A longer, more complicated plot, (say a murder mystery) would require discussion on many different subjects. And all this requires far more precision in language than the current generation of Facade offers. The poor way in which it understands the player’s statements is the greatest weakness in this version of Facade, in my opinion.
Do not think tho that I am down on Facade! I would pay money for this offering (in fact I did). I will show this to my students and coworkers. I felt a connection to the characters in Facade much like what I did when I played Shattertown. That feeling of what I say makes an emotional difference to the NPC’s is something sadly lacking in most games today.
The most important thing about Facade is that it is first. It proves that a complete interactive fiction can be built. So discussion will move from: "is it possible", to "can it be improved".
And that is a fundamental shift in thinking...
I will put off any large discussion about this until I actually see it in operation. It takes a completely different tack from Facade: in Facade we have a natural language parser which often can’t understand what you are trying to say. Chris’s user interface will ALWAYS be understood by the NPC’s but what you can say is sharply constrained. (A toy language after all.) If Chris can finish it we will have to see if people will accept this system.
Continuous vs Discrete Time
Some subtle points in this discussion that I had not considered. Basically, time is continuous, but polite conversation is discrete. (If everyone is yelling at once and not listening to what others are saying then this breaks down.) Chris’s technology is polite in that it waits patiently for the user to respond when something happens that involves the user. I wonder if Chris’s technology can continue to operate if the user does not want to say anything and does not respond?
Continuous vs Discrete Space
This has been talked about many times in previous Phront. and I am finding the whole subject a bit dull. Chris uses stages, where as in Facade where the PC and NPC’s stand in the room (x-y space) affects and reflects what is going on. (If you stand close to Trip then this means something different than if you are equal spaced between him and Grace.) People will use what ever system is easiest for their product. Erasmatron would be easier for people to use other products if its API’s (Application Programming Interface) allowed continuous time and space. Chris has chosen not to do so and thus his technology will be harder for people to adapt if they are using Cartesian coordinates or continuous time in their product.
Personality Can’t Be Expressed with Personality Models
I disagree with this. In Shattertown or Siboot you could detect differences in personality using Chris’s algorithms and data. As the technology improves, these personalities will better distinguish themselves.
Mellow Parser & Universal Verbs
From my discussion on Facade above it is obvious that I am bothered by ui of the Mellow Parser. In the game I am working on we are using a toy language where you can’t say much, but what you can say will be understood. I will use context to limit which verbs are usable at a given time.
This all flows from my philosophy of user interfaces. They should be clear about what inputs are acceptable and feed back should be given if something is said that is not understood.
Interactive Stories vs Games
I think that the two should be able to flow into each other and borrow ideas from each other. I also think that the games industry is in a rut and there is no chance of it funding Interactive Dramas. So we are on our own until we start making money. Then the game industry will have been there for us all along, of course.
All in all this was the most exciting Phrontisterion ever and my thanks to Chris for putting it on. I would like to thank Michael and Andrew especially, for building Facade and showing it to us.
The next Phrontisterion will be held on the weekend of June 24-25, 2006.
The attendees of Phrontisterion 6
Standing, left to right: Michael Mateas, John McCullough, Mark Covey, Steve Kearsley, Dave Walker, Andrew Stern
Kneeling, left to right: Selene Tan and Koppy, Gordon Landis, Patrick Dugan, Rick Smith, Laura Mixon-Gould, Chris Crawford
In front, left to right: Moose, Auggie
The discussion in progress
Andrew Stern and Michael Mateas explaining Facade
Mark Covey and Dave Walker learning about Facade
Laura Mixon-Gould makes a point to Steve Kearsley, Andrew Stern, and Chris Crawford. Moose is uninterested because she has no food.