The success that a technology enjoys can often be the source of new challenges. In the early days of the automobile, tire and suspension systems were designed to handle the bad roads and low speeds of the day, but as the technology improved, the demands on tires grew. By the late 1920s a new, completely unanticipated problem had emerged:the tire blowout at speed. A sudden tire blowout at 20 miles an hour is not dangerous, but the same event at 45 miles per hour would cause the driver to lose control and often had fatal consequences. Nowadays, tire designs have reduced both the frequency and the severity of high-speed tire blowouts.
In much the same way, computer technology has advanced to the point where new problems arise from the very success of the technology. In particular, I’d like to direct your attention to the concept of user interface. It took us a while to appreciate the importance of the concept; for many years "un-interfaces"such as DOS ruled the computer world. But at long last we seem to have accepted the utility of graphical user interfaces. Hooray for us.
Yet, even as we celebrate the triumph of the GUI, we must feel some unease about its future. Yes, windows and scrollbars and menus and buttons are all very well and good, but after designing with these constructs for more than a decade, I am starting to recognize their many limitations.
Perhaps the clearest way to articulate the nature of the problem is to think about any user interface system as a language of communication between user and computer. We can then ask ourselves how rich and expressive any such language is. A simple-minded way to quantify the concept is to talk about vocabulary size, setting aside less quantifiable issues of grammatical richness.
By this way of thinking, the old joystick/joypad videogames afforded vocabularies of only five words:up, down, left, right, and fire. Yes, with cleverness you could theoretically extend this to as many as 19 distinct words but that was really stretching it.
The text-oriented parser interfaces such as DOS were next in vocabulary size. Theoretically, they can support infinitely large vocabularies, but in practice the vocabularies of all such systems are confined by the capacity of the user’s memory.It’s easy to create a parser interface with a thousand commands, but no user could reasonably be expected to remember all those keywords. Thus, most such systems had actual vocabulary sizes of under a hundred words except UNIX, which blithely ignored this reality and went straight to computer hell for its sin.
The real value of the GUIs is that they make much larger vocabularies practicable. Menu structures make it unnecessary to remember all the commands, and cleverly designed icons, scrollbars, radio buttons, checkboxes, and pushbuttons all suggest their functions in ways that extend the expressive vocabulary of the user. For example, I just checked the options available on my current word processor (MacWrite Pro, a mid-range product), and I estimate that it supports about 150 fundamental commands, plus additional commands buried inside dialog boxes, plus font and color choices, which could range up to several hundred options themselves. This is a great deal of expressive power for a word processor whose primary selling point is a clean, easy-to-use interface.
However, we will soon begin pushing the upper limits of what GUIs can do. I recognize that there is still room for improvement, but my own use of GUI interfaces for more than a decade tells me that the several hundred word vocabulary of MacWrite Pro is pushing close to the natural limits that GUIs can handle. You can only nest submenus so deep; you can only have so many dialog boxes; you can only have so many icons.
The question then becomes, where do we go from here? How will we obtain larger vocabularies so that our users can have richer interactions with the computer?This problem is not some blue-sky issue that will only affect us in the far future when word processors perform wondrous feats and computers walk on water. I believe that we are facing this problem right now.
Consider the nature of interactive entertainment in light of my definition of interactivity as listening, thinking, and talking. We’ve got the talking part down pat:why, we can do all sorts of wondrous video, sound, and animation. We’re starting to get pretty good with the thinking part, too; we’ve seen smarter internal processing in some of the better products of the last few years.
But it’s the listening part that’s killing us right now. Our users can’t say anything interesting to our programs because we just don’t have a decent vocabulary to offer them. We talk about writing great works of art on the computer, about tugging at user’s heartstrings and making them feel glorious feelings, but how can we emotionally engage users with vocabularies consisting of "up, down, left, right, and fire"? We need to give them vocabularies that let them involve themselves at a human level, and a few score or even a few hundred words in a vocabulary won’t be adequate for this task.
So how can we expand the vocabulary of our user interfaces? The obvious answer is to use natural language; after all, we’ve been using natural language interfaces (in the form of text parsers) ever since Adventure.
The catch, of course, is that none of our natural language interfaces actually work. Yes, there are many impressive parsers out there that can understand an amazing amount of language. But, as with so many other subfields of AI research, we’re starting to realize that getting actual working results is a lot tougher than we originally thought.
In the case of natural language, a strong case can be made that we will never crack the problem. Why not? Because natural language mirrors reality, and you can only understand natural language when you understand the reality that spawned it.
The cruelest test of any parser is the sentence, "Time flies like an arrow." Its challenge arises from a triple ambiguity. The word time can be either a verb, an adjective, or a noun. As a verb, the sentence becomes a command to measure flies as if they were arrows. As an adjective, the sentence becomes a declaration that a particular subclass of flies time flies prefer an arrow, presumably to eat. As a noun, the sentence suggests that the dimension of time moves quickly and linearly. Of course, most people have no problem disambiguating this sentence, for we all know that there’s no such fly as a time fly, and nobody in his right mind would ever want to measure fly times in the manner of an arrow. But this disambiguation is dependent upon a detailed knowledge of the world. If you don’t know about the many types of flies (houseflies, tsetse flies, botflies, horseflies), and you don’t know about the process of timing, and how an arrow might or might not be used in such a process, then you can’t disambiguate the sentence.
Although this example is admittedly highly contrived, the problem crops up in more subtle ways in all natural language processing. A sentence of natural language is not an absolute declaration of truth independent of the context in which it takes place. All utterances in natural languages include a wealth of subtle assumptions about the universe in which they take place. In the very first paragraph of this essay, I referred to a "tire blowout". If you had never used an automobile in your life, would you have been able to understand my sentence? I doubt it.
In other words, any successful natural language processor must incorporate the same knowledge of the universe that the speaker possesses. Without that knowledge, natural language cannot be understood. I think we can all agree that giving a computer the same knowledge of the universe that a typical human possesses is a task beyond even our hubris at present. Until we can solve this problem, we have to dismiss the possibility of natural language user interfaces.
But can’t we somehow build a natural language parser that understands most of a language, and then have the computer just say, "Huh?" when the user gets a little too eloquent?Yes, this is certainly possible and in fact it is exactly the strategy that has been used in text adventures. The problem with this approach is that it frustrates the user. There are so many valid ways to express an idea in English, and so few of them can be handled by any reasonable parser, that the human ends up feeling that the parser is a cruel joke. By communicating with the user in his natural language, we suggest that he can communicate with us in the same manner, but when we respond to half of his utterances with "Huh?", the user feels that our interface has misrepresented itself.
This point is important, and is all too often dismissed by computer jocks who don’t appreciate the psychology of user interface. The user must above all feel safe, confident in his interactions with the computer. If the user does not have confidence that input X will yield result Y, then the user will be reluctant to involve himself with the interface. The arbitrariness of partial natural language parsers robs the user of the confidence necessary to interact richly.
So where does this leave us?With a huge gap between the GUIs and the natural language interfaces. We will soon be desperately in need of something to fill that gap. What can we do?
I believe that an answer lies in an earlier body of work which has now fallen on hard times. I refer to the artificial language movement. The earliest speculations about artificial language arose from the hankerings of philosophers for a language that would permit more precise exploration of logical concepts. In the nineteenth century, the concept took on a new goal with the need for improved communications between peoples with different native languages.
In the late nineteenth century the movement suddenly became popular with the popularization of Volapuk and Esperanto. The proponents of these languages, aware of the growing destructive military power of the European nations and eager to preclude a general war, hoped that their languages would be a civilizing and pacifying influence on nations. The efforts continued with increasing success until World War I shattered much of their naive optimism. They picked up the pieces and went back to work, but World War II clobbered them again, and at the same time made much of their work moot by establishing English as a nearly international language. In the fifty years since then, the steady advance of English has rendered discussions of artificial languages academic. Who needs an international language when we already have one especially when it’s the one we already know? (How very convenient!)
And so artificial languages were, for the most part, forgotten. And that’s where things stand today. But perhaps we could use an artificial language or something like it as an intermediate step between the GUIs and the natural languages. Perhaps it is possible to design a language easy enough for computers to understand, rich enough for people to use, and yet easy for people to learn.
The design problem can be broken into two parts:designing the grammar and choosing the vocabulary. The grammar is the easier of the two problems, as it has generally been held that you can design any grammar at all, and people will learn it. This assumption, by the way, has been called into question by the work of Chomsky, but most of the grammars proposed so far are conventional enough that they don’t run afoul of Chomsky’s discoveries.
The vocabulary is another matter. Any decent language will have a large vocabulary, large enough to present a serious obstacle to its acquisition. The designer of an artificial language must somehow ease the task of learning the vocabulary. Two approaches have been tried:a priori and a posteriori. The former approach attempts to rationalize vocabulary from first principles, to design a system for creating words that is so clear, so rational, that anybody can pretty well guess a word’s meaning just by looking at it. The latter approach, far more common, attempts to recycle the most common or recognizable stems from various languages. Thus, the stem pater can be recognized as meaning father in all the Indo-European languages, so language designers using the a posteriori approach always use something very close to pater in their vocabularies.
Rather than jumping straight into the design of such a language, I’d like to present a description of some of the artificial languages developed in past times, not with the intent of reviving them, but rather to point out some of their more interesting features. This list is certainly not inclusive or even representative; there have been hundreds of artificial languages, each with its own passionate advocates. I present this list as a museum of the most spectacular dinosaurs rather than a taxonomy of all reptiles.
This is the only commonly recognized artificial language. It was designed by Dr.Louis Zamenhof, of Byalistok, in 1885. The language has some charming aspects, such as its regularized endings and middling-clean syntax. Its vocabulary is a hodge-podge of Latin, English, Teutonic, and Slavic stems tossed together with no discernible system. It represents a series of compromises and committee decisions, and thus lacks the kind of brilliant design that we might learn from. Its importance comes from its exclusive position as the only alternative to English as a viable international language; these days, that doesn’t count for much. Here is the Lord’s Prayer in Esperanto:
Patro nia, kiu estas en la chielo, sankta estu via nomo; venu regeco via; estu volo via, tiel en la chielo, tiel ankau sur la tero. Panon nian chiutagan donu al ni hodiau; kaj pardonu al ni shuldojn niajn, kiel ni ankau pardonas al niaj shuldantoj; kaj ne konduku nin in la tenton, sed liberigu nin de la malbono.
OK, for all you lapsed Christians out there, here is my recollection of the Lord’s Prayer, in 20th century American English:
Our father, who is in heaven, holy is your name. Let your kingdom come to us, and your will be done by us, here on earth as it already is in heaven. Give us our bread each day. Forgive our transgressions even as we forgive those who transgress against ourselves. Please do not lead us into temptation, but liberate us from evil.
Long before computer scientists hijacked the term Interlingua for their own use, an Italian mathematician by the name of Giuseppe Peano used the term for his own artificial language. His approach, long speculated upon by academics, was to clean up Latin and update its vocabulary. The main approach was to get rid of all those ghastly declensions and conjugations and replace them with a single system with no exceptions. (Long-suffering students of Latin leap for joy at the thought.)There had been several previous attempts along these lines, but Peano’s was the most thoroughgoing, and it quickly garnered wide respect among academics who led the international language movement. His language, originally named Latino sine Flexione, but later picking up the sobriquet Interlingua, attracted wide attention. Its grammar was ruthlessly simplified but its vocabulary was straight out of classical Latin, a fact that led academics to underestimate the difficulty that regular people would have in learning the language. Hey, everybody knows basic Latin, right? An even better feature of the language was that it could serve as a stepping stone for those wishing to learn classical Latin. Put that bullet on the outside of your box the marketing people will swoon with ecstasy.
Here’s the Lord’s Prayer in Interlingua:
Patre nostro, qui es in celos, que tuo nomine fi sanctificato. Que tuo regno adveni; que tua voluntate es facta sicut in celo et in terra. Da hodie ad nos nostro pane quotidiano. Et remitte ad nos nostros debitos, sicut et nos remitte ad nostros debitores. Et non induce nos in temptatione, sed libera nos ab malo.
This is one of the more recent developments, designed in the 1960s not as an international language, but rather as a tool for academics and a proposed language of communication between humans and computers. The design is on the far end of the "philosophically correct" spectrum in that the language is ruthlessly logical in every aspect. Spelling is so formally defined that there were never any questions as to the spelling of any word. Word length is kept to an absolute minimum; indeed, the Loglan designers attempt to assign a meaning to every distinguishable short sequence of letters. Thus it is an a priori language; little attempt is made to create words that might be recognized by speakers of natural languages. (Although we see a flash of humor in assigning the word bilko to designate a group of soldiers.)
There are some interesting grammatical ideas in Loglan, largely because it made no attempt to be a practical language but was instead a purely academic language. In terms of vocabulary, I found little of interest.
The problem with Loglan is the amount of time required to learn its arbitrary vocabulary. Who wants to remember that kinturka means collaborate, and takmu means convince? As far as I know, nothing ever came of it other than some DoD research grants. Where’s Senator Proxmire when you really need him?
If Giuseppe Peano can clean up Latin to get Interlingua, why can’t we try the same thing with English?Indeed, somebody attempted exactly that in the 1920s, coming up with something called "Basic English". This guy’s goal was to squeeze English down into 850 words, and he pulled it off. He boiled it down to 100 operations, 400 General Things, 200 Picturable Things, 100 General Qualities, and 50 Opposites. The entire vocabulary of Basic English fits on a single page. Particularly impressive is the handling of verbs. There are only 18 verbs in Basic English:Be, Come, Do, Get, Give, Go, Have, Keep, Let, Make, Put, Say, See, Seem, Send, and Take. What makes the system work is the extension of these verbs with prepositions. You can go for a walk, go for a drink, go to the store, go out of the house, go in the tent, go over his head, go by the stand, go with your friend, go to the boss, go on a hunch, and so on.
I was particularly excited when I came across the Panopticon, a word wheel for constructing sentences in Basic English.This was a mechanically operated combinatorial algorithm with seven nested disks of cardboard. You could rotate the disks in any combination and obtain a valid Basic English sentence. This certainly suggested a computable algorithm behind the operation of Basic English.
Although the tiny vocabulary makes Basic English appear to be an ideal basis for human/computer interface, the hidden gotcha in the language is its heavy reliance on idiom. The verb phrases "go over his head" or "go on a hunch" are not literally parsable; they are idioms, not logical extensions of the verb "go". Moreover, Basic English suffers from the same drawback that current parsers have:if you can understand some common English, but not all common English, how does the user know where the line is drawn?What happens to the feelings of the user who repeatedly trips over that line?
This is one of the oldest artificial languages, and certainly one of the most fascinating; I think that it offers many interesting ideas for current designers. The language was designed in 1817 by Francois Sudre. It is based on the seven notes of the musical scale:do re mi fa sol la ti. The use of only seven symbols instead of the two dozen or so in conventional alphabets has profound implications. First, unlike conventional alphabets, any combination of letters is usable. Thus, even with so few letters, a large vocabulary can be built up without overly long words; a vocabulary of more than 100,000 six-letter words is possible.
Moreover, by confining the alphabet to only seven letters, we gain multimodality of expression, because other human sensory organs, not so finely attuned as the ear, can now be used with the language. Consider the many modalities of expression available to such a language: we could replace each symbol with a single letter (i.e., do re mi fa sol la ti becomes d r m f s l t) for greater conciseness. For computer terms, the whole thing maps cleanly to octal, with the zero reserved for a space. Moreover, you can use the musical notes in place of the pronounced syllables you can sing solresol rather that speak it. Or you can assign a color to each of the seven symbols (red, orange, yellow, blue purple, and brown), thereby making it possible to denote words as sequences of colors.
Such a language could readily be used by the handicapped. For the deaf, a truly simple finger-spelling language is possible, with three fingers reserved for punctuation and the others assigned to the syllables. The blind, of course, could sing it or hear it, but a form of braille would be much easier to design and support. Consider also the universality of this system; once the language has been defined, all users, whatever their handicap, have equal access to the language. A blind person can sing and hear in solresol almost as quickly and efficiently as a deaf person can type and read; and both have complete access to the exact same language that regular people use.
This gets really exciting when we start thinking about the interface between humans and computers. Remember the interface gap between GUIs and natural language?Something like this falls right into the middle of that gap. Yes, it’s more complicated than a GUI but it’s a lot less complicated than a natural language. Consider the ways in which input could be handled. We could use a standard keyboard to type out the syllables or just the letters. Even better, we could have a simple hand-held device, more like a mouse than a keyboard, that contains all the buttons needed. Four finger buttons and a thumb-shift key would allow complete expression in solresol.
But that’s not the only form of input. We could just as easily create a simpler form of voice recognition than we already have operating, and the problems of voice discrimination would be dramatically reduced. It should not be difficult to achieve speaker-independence with so simple a system. And any speaker having difficulty getting through to the computer could always sing out the scale (do re mi fa sol la ti do) as a means of synchronizing the computer to the peculiarities of his voice.
Now let’s talk about output. The computer could speak, sing, draw, or color its output, or all of the above. A user could tell the computer to confine output to any combination of the sensory modalities.
But now we come to the really ugly problem with any such language:the vocabulary. How are we going to get people to learn a vocabulary with thousands of words such as milasolsol or tilatido?The trick, I think, is to grow the language from scratch. Remember, every combination of syllables makes sense. So why don’t we start off with the seven single syllables representing some of the simplest and most fundamental interface words, like so:
re: whatever is pointed to by the mouse
mi: get, open
fa: do, run, launch
sol: cancel, abort
la: save, close
These seven should not be difficult to learn, and all by themselves, they would greatly facilitate user interface. In other words, if we implemented a voice-driven command language that used just these seven words, it would be immediately useful.
But we don’t have to stop there. Next, we define the 49 two-syllable words at about the level of DOS commands: format, print, quit application, switch window, and so forth. Right there you have a functioning operating system with 56 commands that addresses all the basic needs of a computer system. Next, you add the 343 three-syllable words and you’ve got yourself a system with more expressive power than most GUIs offer. A decade or two down the road, after you’ve developed a solid basis of user experience, you start using the 2,401 four-syllable words.
Granted, nobody is going to learn all those words from scratch. But right now computing doesn’t need several thousand words of expression; it’s getting along just fine with a few hundred. People can start off learning a few basic commands and expand their vocabulary only as their needs expand. Occasional users need only learn to speak two-syllable words. More frequent users might learn many of the three-syllable words. And the power users can boast of their mastery of the four-syllable words. There’s no reason why this language can’t continue to grow for decades, into five- and even six-syllable word vocabularies. The learning curve for this language is smooth and shallow and there’s no ceiling.
Realize too that a solresol language of computer interface need not replace any existing interfaces; it can be tacked onto the existing GUIs. I can easily imagine a graphic artist sitting in front of keyboard, mouse, and microphone, painting an intricate picture with the mouse, but singing out "la" when she wants to save the file, "soldo" when she wants to print it, and "mifare" when she wants to change palettes. And wouldn’t it be wonderful to be able to dismiss all those idiotic "are you really sure you want to do this?" messages by singing "do" for no and "ti" for yes? Twenty years from now, that same graphic artist might be saying "domisolfa" when she wants a Gaussian smoothing applied to her image -- but she never bothers to learn that "domisolre" tells an email application to encrypt a file before transmitting it. She doesn’t need to learn that stuff.
Hardware manufacturers can offer solresol peripherals:seven-key input pads, smart microphones that recognize solresol words, or the solresol mouse described earlier.
Oh yes, I didn’t mention the intrinsic internationality of solresol. With the Internet taking on a truly international character, isn’t it time to stop burdening it with English parochialism? A solresol language could be expanded into an international form of communication, finally satisfying the dreams of those early language-designers.
Which brings us to the "Academie Francaise" objection. The French have this crazy notion that their language should be defined and controlled by a committee of "top people"who pass judgement on the "Frenchiness" of any word or expression. Anglo-Saxon barbarisms such as "hamburger"and "rock ’n roll" are replaced with suitably Frenchified terms such as "deux tout-boeuf patois en la baguette a la sesame seed" or "Elvis Presley". We Anglo-Saxons, happy in our anarchistic squalor, just keep making up new terms and linguistic fads, because this provides our lexicographers with full employment. If we’re going to invent a language for human-computer interface, who will define and control the language some "Academie Solresol" of "top people"who impose their linguistic tastes on the computing public?
Truth be told, we’ve had such academies from the very beginning; the biggest is known as "Academie du Microsoft" and not only do they impose their notions of user interface on the computing community, but we pay them billions of dollars for the privilege!
This does raise the possibility that Microsoft might not do the job right. Anybody who has used Windows can imagine what havoc this company could wreak on the musical scale. Would they define an eight-note scale, just to make it cleanly octal?Would they name that eighth note "bill"? What if they decided to replace "mi"with a C# because Bill can’t hit a B-note? As we have seen so many times before, the opportunities for screwing up are manifold. While the theoretical possibilities of a little language like solresol are exhilarating, the likely realities of implementation are disheartening.
Despite my cynicism, I’m very excited about the possibilities of a solresol-type language. This could be the next big jump in user interface design. Yes, we’re likely to screw it up, and some greedy bastard will probably figure out a way to use our concepts to enrich himself, but that’s what happened to Gutenberg, James Watt, Alexander Bell, and Alan Kay why should we be any different? The important thing at this point is to figure out if this idea is worth exploring. Hey, maybe I’m crazy; maybe this idea just won’t work. For now, my intention is to place this idea before the community and ask if others like its possibilities.
I hereby disclaim any proprietary interest in the ideas presented in this essay. I am placing these proposals in the public domain, and waive any rights to control or derive benefit from them. Some ideas can and should be personal property, but this is not one of them. Its success requires the joint efforts of many people, and there’s no obvious way to provide financial inducements for all those people. The only beneficiaries of solresol will be the users and, thus, the industry as a whole.
So I put the ball into your court. Is this crazy or brilliant?I’ll monitor feedback and, if interest warrants it, I’ll put together whatever organization, formal or informal, that the community seems to desire.