I am induced to write this essay by some discussions regarding the ideal user interface for interactive storytelling. There appears to be a yearning for an interface that is emotionally more powerful than the cold system of sentences used in the Storytron technology. This yearning is too vague to be expressed clearly, a fact demonstrating how shallow the thinking behind it is. My hunch is that people would like to see something more cinematic, something more visually compelling. The old writer’s dictum “Don’t talk about it, show it!” springs to mind. Yet the advocates of this thinking have not, I suggest, come to grips with the fundamentals of storytelling.
I have elsewhere presented my argument that the sentence is the fundamental data structure of conscious cognition. We can all agree that sentences tell stories. Even a single sentence can tell a story. The confusion arises from the fact that a story can also be told visually without any use of language whatever. There are plenty of cartoons and movies meant for an international audience that use no language, yet communicate a powerful story.
But interactive storytelling is not cinema, nor should it try to be cinema. Why are humans so dense that they insist upon perceiving the new in terms of the old? Why can’t they just look deeper at the fundamentals? When cinema was new, they insisted on perceiving it as “canned theater”. It took decades for them to realize that cinema is a different medium with different strengths and weaknesses. Now they’re doing the same thing with interactive storytelling, insisting upon seeing it as “cinema with interactivity tacked on”.
Sure, we can do cinema with the computer. We can even do a kind of procedural cinema that is in some manner responsive to the dramatic situation. But let me remind you of my definition of interactivity:
A cyclic process in which two active agents alternately listen, think, and speak.
The human player is one of those active agents. How does the player speak to the computer? In a romantic scene, will the player pucker up and make smooching sounds? Caress the air? Sheesh, be serious!
Language is the only plausible input system for a human player. Someday, I hope that players will be able to use spoken language to express their dramatic intentions — but our technology isn’t good enough for that yet. We can’t even handle natural language in text form. Yes, we can handle a subset of natural language in text format; we can even handle a subset of natural language in spoken format. But that’s nowhere near good enough.
Text adventures don’t get much attention these days, but they were the focus of much attention in the 1980s. All text adventures suffered from a nasty problem: the parser puzzle. Sure, you could type in anything at all — but the parser understood only a tiny fraction of the English language, and all too often you spent most of your time figuring out how to express yourself in a manner comprehensible to the parser. In one particularly egregious case, I spent half an hour trying to get the parser to permit me to utilize a rock that it had ostentatiously made me aware of. It had described the rock in some detail and it was obvious that the rock was somehow important. But, try as I might, I could not pick up the rock. I tried “take rock”, “pick up rock”, “grab rock”, “get rock” and every other verb I could think of. Only later did I learn that the solution was “take stone”. The designers thought it clever to require the player to use a synonym for “rock”. Ha. Ha. Ha. Very funny, designers.
Human language is all but infinite in its complexity. It mirrors the reality in which we live, and that reality is immense and constantly changing. We can create completely new expressions that people will quickly grasp. If I say, “The salesman tried to pull a Trump on me, but I didn’t fall for it”, you know exactly what I mean. But teaching a computer to understand this kind of thing is decades in the future. Any form of linguistic input system that we utilize on a computer will always fall foul of this problem.
So how do we permit the user to express himself through language while avoiding this invisible pitfall? I think that we must abandon all hope of permitting free-form linguistic input, at least for a few decades. Any linguistic input from the user must go through a parser, and because that parser is never perfect, it will surely end up rejecting some inputs as incomprehensible.
What fraction of likely inputs will be comprehensible to a state-of-the-art parser? I believe that the fraction will be so low that the parser will reject too much of the user’s inputs, frustrating the user. This means that, for now, we must eschew free linguistic input for menu-driven input. Rather than post-parse the user’s input, rejecting much of it, I think it better to pre-parse the possible inputs and offer the user a menu of expressions that fit the context.
Particularly difficult is the matter of fine shades of meaning. Even if we could develop natural language parsing to a level of understanding sufficient to grasp the fine shades of meaning, I doubt that we could build algorithms capable of understanding and responding to such fine shades of meaning. For now, our dramatic algorithms will be simplistic, and plugging sophisticated language into simplistic algorithms will only upset the players.
This is why I believe that an artificial language such as used in the Siboot design is necessary for the forseeable future.