Computing Basic Story Structures

March 10, 2022

I am motivated to write this essay by a paper published in 2016 in which the authors analyze 1,327 stories stored in the Project Gutenberg’s archive, attempting to discover any basic underlying architectures.

History of analysis of story structures
People have been doing this for a long time; it is obvious that many stories are similar and that these similarities suggest fundamental underlying architectures. The paper provides a good list of these previous attempts in Appendix A, which I shamelessly copy and paste here:


  • Three plots: In his 1959 book, Foster-Harris contends that there are three basic patterns of plot (extending from the one central pattern of conflict): the happy ending, the unhappy ending, and the tragedy [39]. In these three versions, the outcome of the story hinges on the nature and fortune of a central character: virtuous, selfish, or struck by fate, respectively.
  • Seven plots: Often espoused as early as elementary school in the United States, we have the notion that plots revolve around the conflict of an individual with either (1) him or herself, (2) nature, (3) another individual, (4) the environment, (5) technology, (6) the supernatural, or (7) a higher power [40].
  • Seven plots: Representing over three decades of work, Christopher Booker’s The Seven Basic Plots: Why we tell stories describes in great detail seven narrative structures: [24]
          – Overcoming the monster (e.g., Beowulf ).
          – Rags to riches (e.g., Cinderella).
          – The quest (e.g., King Solomons Mines).
          – Voyage and return (e.g., The Time Machine). 
          – Comedy (e.g., A Midsummer Night’s Dream). 
          – Tragedy (e.g., Anna Karenina).
          – Rebirth (e.g., Beauty and the Beast).

              In addition to these seven, Booker contends that the unhappy ending of all but the tragedy are also possible.

       • Twenty plots: In 20 Master Plots, Ronald Tobias proposes plots that include “quest”, “underdog”, “metamor- phosis”, “ascension”, and “descension” [41].

        • Thirty-six plots: In a translation by Lucille Ray, Georges Polti attempts to reconstruct the 36 plots that he posits Gozzi originally enumerated [42]. These are quite specific and include “rivalry of kinsmen”, “all sacrificed for passion”, both involuntary and voluntary “crimes of love” (with many more on this theme), “pursuit”, and “falling prey to cruelty of misfortune”. 


The new approach
It should be obvious from this list that there is a great deal of subjectivity in these taxonomies. The authors therefore attempted an empirical approach using analyses of actual stories. They carried out impressively sophisticated statistical analyses of the “sentiment scores” of long sections of the text of each story. These statistical analyses produced graphs of how the sentiment scores varied through the course of the story. Here, for example, is Figure 2 from their paper:

This shows how the overall tone of happiness varied through the course of the story. They prepared similar graphs for many other stories. Next, they carried out a further analysis that somehow (I don’t quite understand their algorithmic strategy here) produced an overall arc for the story. They then analyzed these arcs and generated six “fundamental arcs” from their analysis. Here are these six arcs, showing the arcs of individual stories in black and the average of all in red:

They characterized each of these six arcs as follows:

• “Rags to riches” (rise).
• “Tragedy”, or “Riches to rags” (fall).
• “Man in a hole” (fall-rise).
• “Icarus” (rise-fall).
• “Cinderella” (rise-fall-rise).
• “Oedipus” (fall-rise-fall).

My previous work
About twenty years ago, I attempted a subjective version of the same concept. My approach was preliminary and microscopic in scope compared to the work of the authors of this paper, but it differed in one important aspect. My analysis did not rely on a single variable. This paper relied on a single variable: happiness. I thought it necessary to do the analysis with three different variables, which I shall call happiness, danger and a third variable that I cannot recall. (My memory of this work is weak.) I sketched rough graphs of how these variables shifted during the course of the story. I concluded that the idea was an interesting and only slightly illuminating way of thinking about story structure. 

Further considerations
I doubt that the authors of this paper consider their work definitive in the slightest; it is obviously a first foray into a complicated world. I offer here some considerations on how this kind of work might profitably be expanded.

Multiple variables
It should be obvious that the single variable “happiness” is inadequate to capture the richness of many stories. In the movie “The Empire Strikes Back”, the episode about Luke’s stay with Yoda on the planet Dagobah is not happy at all, yet neither is it in any way similar to the episode in which Han Solo is encased in carbonite. Neither of these episodes are happy, yet they are radically different in nature. Hence, the single variable “happiness” is inadequate to differentiate the two episodes. What other variable could differentiate the two episodes? Here we encounter a problem very similar to the problem we face with devising a personality model for characters in stories: we must find a set of variables that are orthogonal and span the vector space of possibilities. I have not given much thought to this problem. Happiness is certainly a good variable, but we need a variable capable of differentiating the two aforementioned episodes. Danger? Conflict? What is it about Luke’s experience with Yoda that is important? Character growth? Maturation? Gaining strength? This problem demands a lot of thought and experimentation.

Ontogeny and phylogeny of stories
One of the grander simplifications of this paper is its implicit assumption that stories are timeless and universal. In truth, stories vary greatly across a number of dimensions: ontogenetic, phylogenetic, cultural, and sexual.

The ontogenetic variation is easiest to understand. During the course of our lives we undergo an evolution in our appreciation of stories. The stories of childhood give way to increasingly complex stories.

Phylogeny of stories
Storytelling has evolved over the course of time, going through five major phases. The first of these I shall call “tribal storytelling”, during which stories were, in effect, the property of the tribe, and were generally transmitted from the older generation to the younger generation. Anybody could tell the story, but normally precedence was awarded to the oldest or best storyteller. Stories were not the product of individual artistic creativity, but were instead a cultural treasure that was passed from generation to generation. Many of these stories survived into modern times as folktales and were recorded by scholars during the nineteenth and early twentieth centuries. One of the efforts to discern some kind of structure in these was the rise of catalogues of folktales, and various abstractions of these such as the Aarne-Thompson-Uther Index of motifs of folktales. 

In the second phase, stories became so lengthy that they were consigned to professional bards who memorized them. The professionalization of storytelling became so specialized that societies could afford to maintain only a handful of specialists, who travelled from place to place, telling their stories in return for food, lodging, and gifts. This in turn enabled the stories to become ever longer. Later, when writing became common enough, some of these stories were written down; this is how we preserved the Iliad and Odyssey, the story of Gilgamesh, some of the early stories in the Bible, and the Mahabharata. This phase lasted from the period of early agriculture until about 1500 CE. (Even though writing was common by 500 BCE, books were so expensive that, while they could record stories, they could not replace professional storytellers in dispersed communities.)

The third phase began some decades after the invention of the printing press. At first, books were still to rare and expensive to support much of a market for stories, but William Caxton printed Le Morte d’Arthur in 1485, and it was a big hit. This opened the doors to other printed works of fiction. However, printed books remained too expensive for the masses, so novels were still limited in reach. 

With the passage of time, printing became more efficient, paper became less expensive, and books became increasingly accessible to larger portions of the population. The most striking step in this process came in the mid-19th century, with the success of Charles Dickens. His “software” greatly expanded the popularity of the “hardware” (printed books) and ushered in mass literacy. The invention of paperback books in the twentieth century expanded the scale of published stories even further. This was the fourth phase of storytelling.

The fifth phase began early in the twentieth century with the rise of cinema, but truly exploded in the 1950s with television. Here was a medium that needed stories en masse. During the 1950s and 1960s, about 500 movies were produced each year, but television required stories by the gross. Soap operas like “Search for Tomorrow”, westerns like “The Roy Rogers Show”, and sitcoms like “I Love Lucy” ground out thousands of stories each year. I was a young boy during the 1950s, and I recall the family gathering around the TV every evening to watch. We’d see two or three different stories every night. The quantity and quality of the TV stories increased every year. By the time I went off to college and stopped watching TV, I had seen many thousands of stories.

Each of these phases changed the nature of storytelling. In the first phase (tribal storytelling), stories had to be short enough for easy recollection, and often contained valuable information about natural, social, and personal relationships. In the second phase (bards), stories shifted to place greater emphasis on the heroism that well-off nobility preferred to fantasize about. In the third phase (early printing), there was a bit of a shift towards poking fun at various institutions (e.g. Erasmus, Cervantes, and Swift). In the fourth phase (late printing) there was a huge expansion of topics addressed in printed storytelling: romance, comedy, satire, tragedy, social commentary, and much more. 

The fifth phase (television) produced a highly story-literate audience, requiring more condensed and complicated stories. This was first pointed out to me by one of the researchers who advised me to watch the opening sequence of Raiders of the Lost Ark:


Jump ahead to 3:02 and watch particularly closely the 3-second sequence starting at 3:08. There are five different shots in those three seconds:

1. Hand grabs whip
2. Hand swings whip over head
3. Hand swings whip forward as assistant reacts
4. Long shot of all three people
5. Villain reacts in pain

Now, all that action happens too fast for my parents’ generation, but people of my generation had no problem understanding it. Video is speeding up; I find some of the most modern video to be running a bit too fast for my taste. 

This compression allows a great deal more storytelling to be squeezed into the available time. For example, try watching any one-hour episode of the 1960s version of The Twilight Zone. You’ll find it slow-going, and you’ll be wishing that it would speed up. That’s because you have become used to the fast pace of modern video.