I read in the Economist that AI technology has finally blasted off using something they now call “deep learning”. I developed something very similar, and in some important ways, different more than 40 years ago.
I’m not kidding; in my freshman year I was encouraged by my psychology professor to read into the work of a Russian psychologist named Luria who had treated brain-injured Russian soldiers after World War II. With mountains of data to work with, Luria convincingly showed that the brain is highly plastic — many of the well-established maps of brain function turned out to be wrong. A patient could relearn behaviors that were supposedly controlled by portions of the brain that had been destroyed. This is now universally recognized and many therapies have been developed to help brain-damaged patients recover some of their behaviors.
Thinking about it, I realized that the brain had to be able to re-route signals. This required a different design. Here’s what I came up with:
I was working with this diagram in 1970; it is an idealized representation of a set of neurons. At the time, this concept had already been discussed, but somebody had prepared a mathematical theorem “proving” that it wouldn’t work. I showed it to a few people, but they told me about the theorem and warned me that it was a dead end. I wasn’t convinced, and continued developing the idea. I realized that there were three crucial numbers defining any such system. (I coined the term “schemolic” to describe this kind of system).
The first critical number is the typical threshold of a neuron in the system. In the above layout, the typical threshold is two; a neuron needs two inputs to trigger. But I knew that, in any real system, the typical threshold would be much higher.
The second critical number is the number of layers. This diagram shows two layers. There’s no reason why it couldn’t be three layers, or five, or any number of layers.
The third critical number is the number of connections per neuron. In the above diagram, there are three connections per neuron. Again, I knew that, in any real system, there would be many more connections.
While in grad school in the mid-70s, I developed the idea even further, and wrote some computer programs to model their behavior. In so doing, I made several improvements. First, I dumped the whole idea of layers. The brain doesn’t have layers. Neurons have connections all over the place in the brain. So I went to a new design in which any neuron in the system could have an output to any other neuron. The connections were assigned randomly during program initialization. As far as I know, AI researchers have not yet cottoned onto this notion — but I really haven’t been following their work.
Second, I made a lot of connections between neurons. I think that my initial design had a thousand neurons with a hundred connections per neuron. A randomly selected subset of the system made up the “output”, and another randomly selected subset comprised the “input”.
Third, I started off with low thresholds. This is based on my understanding the human nervous system. It starts off with low thresholds, and behavior is sculpted by increasing thresholds. I never found any reason to believe that thresholds could be reduced.
Fourth, the learning mechanism is pain. My idea here is, I think, unorthodox. Pain is not some special signal. There is no “pain” channel in the nervous system. Instead, pain is transmitted through all the usual sensory channels, but consists only of the neurons firing at their maximum rate. The brightest light is painful; the loudest sound is painful; the hottest heat is painful; and so forth. We simply saturate the signal and that constitutes pain.
So all we need from there is a biochemical mechanism that causes neurons to increase their thresholds whenever they have recently fired and shortly thereafter get hit with a fully saturated set of inputs. After studying the biochemistry of the neuron, I could not put my finger on that mechanism, but I convinced myself that the idea is plausible, while the notion of a neuron lowering its threshold seems less likely.
Here’s how it works: a schemolic system is crunching away, with impulses darting around the network in waves that reverberate through the whole system. If the output set produces an undesirable result, this is communicated to the overall system as a sudden saturation of inputs. Suddenly all the neurons get swamped in inputs. Those that were quiescent weather the storm with no effects, but those neurons that had fired immediately prior to the painful event are now depleted and the saturation event ‘damages’ them, causing them to raise their thresholds. In other words, any neuron that contributed to the ‘mistake’ is changed so as to make it less likely to fire next time. Thinking of the system as a whole, those neurons that, as a group, made the mistake, will be less likely to make that mistake again. Those neurons that did not contribute to the mistake are unaffected.
As far as I know, the AI researchers haven’t yet cottoned onto this idea, either.
I programmed the system to be initially overconnected and under-thresholded. That is, any inputs would cause the system to overreact, triggering all outputs. This is why, when you tickle a baby, it waves its arms, shakes its legs, engages in facial expressions and vocalizes. It’s a whole-body reaction to a single input. As the baby learns, it sculpts its behavioral response downward by increasing thresholds.
I had this system running and capable of learning to respond to different input patterns with different output patterns, but it failed to handle much learning. It could learn a few patterns, but after just three or four patterns, its thresholds would rise so high that it would, in effect, die. That’s where I left the problem when I graduated.
I took it up again in 1983, while at Atari research. I used my little Atari computer to build a simulation of the system. By now, such systems were called “neural networks”. I didn’t make much progress.
Anyway, I have long wanted to return to the problem, but I have so many other projects, and the academics seem to be making good progress. I just wish that they’d apply all of the ideas that I had implemented 40 years ago:
1. full interconnections instead of layers
2. initially overconnected and under-thresholded
3. behavior modification by increasing thresholds of neurons that had fired immediately prior to the mistake
I’m not claiming priority here; that goes to the people who publish, and I didn’t make any effort to publish. After all, they wouldn’t have listened anyway.
This essay was slapped together in less than an hour’s time. It surely has some errors in it. If you see a significant error in it, please contact me so that I can correct it. Use the contact form at the top of the page.