Part 5 (1/2)

A mother rat will build a nest for her young even if she has never seen another rat in her lifetime. mother rat will build a nest for her young even if she has never seen another rat in her lifetime.1 Similarly, a spider will spin a web, a caterpillar will create her own coc.o.o.n, and a beaver will build a dam, even if no contemporary ever showed them how to accomplish these complex tasks. That is not to say that these are not learned behaviors. It is just that these animals did not learn them in a single lifetime-they learned them over thousands of lifetimes. The evolution of animal behavior does const.i.tute a learning process, but it is learning by the species, not by the individual, and the fruits of this learning process are encoded in DNA. Similarly, a spider will spin a web, a caterpillar will create her own coc.o.o.n, and a beaver will build a dam, even if no contemporary ever showed them how to accomplish these complex tasks. That is not to say that these are not learned behaviors. It is just that these animals did not learn them in a single lifetime-they learned them over thousands of lifetimes. The evolution of animal behavior does const.i.tute a learning process, but it is learning by the species, not by the individual, and the fruits of this learning process are encoded in DNA.

To appreciate the significance of the evolution of the neocortex, consider that it greatly sped up the process of learning (hierarchical knowledge) from thousands of years to months (or less). Even if millions of animals in a particular mammalian species failed to solve a problem (requiring a hierarchy of steps), it required only one to accidentally stumble upon a solution. That new method would then be copied and spread exponentially through the population.

We are now in a position to speed up the learning process by a factor of thousands or millions once again by migrating from biological to nonbiological intelligence. Once a digital neocortex learns a skill, it can transfer that know-how in minutes or even seconds. As one of many examples, at my first company, Kurzweil Computer Products (now Nuance Speech Technologies), which I founded in 1973, we spent years training a set of research computers to recognize printed letters from scanned doc.u.ments, a technology called omni-font (any type font) optical character recognition (OCR). This particular technology has now been in continual development for almost forty years, with the current product called OmniPage from Nuance. If you want your computer to recognize printed letters, you don't need to spend years training it to do so, as we did-you can simply download the evolved patterns already learned by the research computers in the form of software. In the 1980s we began on speech recognition, and that technology, which has also been in continuous development now for several decades, is part of Siri. Again, you can download in seconds the evolved patterns learned by the research computers over many years.

Ultimately we will create an artificial neocortex that has the full range and flexibility of its human counterpart. Consider the benefits. Electronic circuits are millions of times faster than our biological circuits. At first we will have to devote all of this speed increase to compensating for the relative lack of parallelism in our computers, but ultimately the digital neocortex will be much faster than the biological variety and will only continue to increase in speed.

When we augment our own neocortex with a synthetic version, we won't have to worry about how much additional neocortex can physically fit into our bodies and brains, as most of it will be in the cloud, like most of the computing we use today. I estimated earlier that we have on the order of 300 million pattern recognizers in our biological neocortex. That's as much as could be squeezed into our skulls even with the evolutionary innovation of a large forehead and with the neocortex taking about 80 percent of the available s.p.a.ce. As soon as we start thinking in the cloud, there will be no natural limits-we will be able to use billions or trillions of pattern recognizers, basically whatever we need, and whatever the law of accelerating returns can provide at each point in time.

In order for a digital neocortex to learn a new skill, it will still require many iterations of education, just as a biological neocortex does, but once a single digital neocortex somewhere and at some time learns something, it can share that knowledge with every other digital neocortex without delay. We can each have our own private neocortex extenders in the cloud, just as we have our own private stores of personal data today.

Last but not least, we will be able to back up the digital portion of our intelligence. As we have seen, it is not just a metaphor to state that there is information contained in our neocortex, and it is frightening to contemplate that none of this information is backed up today. There is, of course, one way in which we do back up some of the information in our brains-by writing it down. The ability to transfer at least some of our thinking to a medium that can outlast our biological bodies was a huge step forward, but a great deal of data in our brains continues to remain vulnerable.

Brain Simulations

One approach to building a digital brain is to simulate precisely a biological one. For example, Harvard brain sciences doctoral student David Dalrymple (born in 1991) is planning to simulate the brain of a nematode (a roundworm).2 Dalrymple selected the nematode because of its relatively simple nervous system, which consists of about 300 neurons, and which he plans to simulate at the very detailed level of molecules. He will also create a computer simulation of its body as well as its environment so that his virtual nematode can hunt for (virtual) food and do the other things that nematodes are good at. Dalrymple says it is likely to be the first complete brain upload from a biological animal to a virtual one that lives in a virtual world. Like his simulated nematode, whether even biological nematodes are conscious is open to debate, although in their struggle to eat, digest food, avoid predators, and reproduce, they do have experiences to be conscious of. Dalrymple selected the nematode because of its relatively simple nervous system, which consists of about 300 neurons, and which he plans to simulate at the very detailed level of molecules. He will also create a computer simulation of its body as well as its environment so that his virtual nematode can hunt for (virtual) food and do the other things that nematodes are good at. Dalrymple says it is likely to be the first complete brain upload from a biological animal to a virtual one that lives in a virtual world. Like his simulated nematode, whether even biological nematodes are conscious is open to debate, although in their struggle to eat, digest food, avoid predators, and reproduce, they do have experiences to be conscious of.

At the opposite end of the spectrum, Henry Markram's Blue Brain Project is planning to simulate the human brain, including the entire neocortex as well as the old-brain regions such as the hippocampus, amygdala, and cerebellum. His planned simulations will be built at varying degrees of detail, up to a full simulation at the molecular level. As I reported in chapter 4 chapter 4, Markram has discovered a key module of several dozen neurons that is repeated over and over again in the neocortex, demonstrating that learning is done by these modules and not by individual neurons.

Markram's progress has been scaling up at an exponential pace. He simulated one neuron in 2005, the year the project was initiated. In 2008 his team simulated an entire neocortical column of a rat brain, consisting of 10,000 neurons. By 2011 this expanded to 100 columns, totaling a million cells, which he calls a mesocircuit. One controversy concerning Markram's work is how to verify that the simulations are accurate. In order to do this, these simulations will need to demonstrate learning that I discuss below.

He projects simulating an entire rat brain of 100 mesocircuits, totaling 100 million neurons and about a trillion synapses, by 2014. In a talk at the 2009 TED conference at Oxford, Markram said, ”It is not impossible to build a human brain, and we can do it in 10 years.”3 His most recent target for a full brain simulation is 2023. His most recent target for a full brain simulation is 2023.4 Markram and his team are basing their model on detailed anatomical and electrochemical a.n.a.lyses of actual neurons. Using an automated device they created called a patch-clamp robot, they are measuring the specific ion channels, neurotransmitters, and enzymes that are responsible for the electrochemical activity within each neuron. Their automated system was able to do thirty years of a.n.a.lysis in six months, according to Markram. It was from these a.n.a.lyses that they noticed the ”Lego memory” units that are the basic functional units of the neocortex.

Actual and projected progress of the Blue Brain brain simulation project.

Significant contributions to the technology of robotic patch-clamping was made by MIT neuroscientist Ed Boyden, Georgia Tech mechanical engineering professor Craig Forest, and Forest's graduate student Suhasa Kodandaramaiah. They demonstrated an automated system with one-micrometer precision that can perform scanning of neural tissue at very close range without damaging the delicate membranes of the neurons. ”This is something a robot can do that a human can't,” Boyden commented.

To return to Markram's simulation, after simulating one neocortical column, Markram was quoted as saying, ”Now we just have to scale it up.”5 Scaling is certainly one big factor, but there is one other key hurdle, which is learning. If the Blue Brain Project brain is to ”speak and have an intelligence and behave very much as a human does,” which is how Markram described his goal in a BBC interview in 2009, then it will need to have sufficient content in its simulated neocortex to perform those tasks. Scaling is certainly one big factor, but there is one other key hurdle, which is learning. If the Blue Brain Project brain is to ”speak and have an intelligence and behave very much as a human does,” which is how Markram described his goal in a BBC interview in 2009, then it will need to have sufficient content in its simulated neocortex to perform those tasks.6 As anyone who has tried to hold a conversation with a newborn can attest, there is a lot of learning that must be achieved before this is feasible. As anyone who has tried to hold a conversation with a newborn can attest, there is a lot of learning that must be achieved before this is feasible.

The tip of the patch-clamping robot developed at MIT and Georgia Tech scanning neural tissue.

There are two obvious ways this can be done in a simulated brain such as Blue Brain. One would be to have the brain learn this content the way a human brain does. It can start out like a newborn human baby with an innate capacity for acquiring hierarchical knowledge and with certain transformations preprogrammed in its sensory preprocessing regions. But the learning that takes place between a biological infant and a human person who can hold a conversation would need to occur in a comparable manner in nonbiological learning. The problem with that approach is that a brain that is being simulated at the level of detail antic.i.p.ated for Blue Brain is not expected to run in real time until at least the early 2020s. Even running in real time would be too slow unless the researchers are prepared to wait a decade or two to reach intellectual parity with an adult human, although real-time performance will get steadily faster as computers continue to grow in price/performance.

The other approach is to take one or more biological human brains that have already gained sufficient knowledge to converse in meaningful language and to otherwise behave in a mature manner and copy their neocortical patterns into the simulated brain. The problem with this method is that it requires a noninvasive and nondestructive scanning technology of sufficient spatial and temporal resolution and speed to perform such a task quickly and completely. I would not expect such an ”uploading” technology to be available until around the 2040s. (The computational requirement to simulate a brain at that degree of precision, which I estimate to be 1019 calculations per second, will be available in a supercomputer according to my projections by the early 2020s; however, the necessary nondestructive brain scanning technologies will take longer.) calculations per second, will be available in a supercomputer according to my projections by the early 2020s; however, the necessary nondestructive brain scanning technologies will take longer.) There is a third approach, which is the one I believe simulation projects such as Blue Brain will need to pursue. One can simplify molecular models by creating functional equivalents at different levels of specificity, ranging from my own functional algorithmic method (as described in this book) to simulations that are closer to full molecular simulations. The speed of learning can thereby be increased by a factor of hundreds or thousands depending on the degree of simplification used. An educational program can be devised for the simulated brain (using the functional model) that it can learn relatively quickly. Then the full molecular simulation can be subst.i.tuted for the simplified model while still using its acc.u.mulated learning. We can then simulate learning with the full molecular model at a much slower speed.

American computer scientist Dharmendra Modha and his IBM colleagues have created a cell-by-cell simulation of a portion of the human visual neocortex comprising 1.6 billion virtual neurons and 9 trillion synapses, which is equivalent to a cat neocortex. It runs 100 times slower than real time on an IBM BlueGene/P supercomputer consisting of 147,456 processors. The work received the Gordon Bell Prize from the a.s.sociation for Computing Machinery.

The purpose of a brain simulation project such as Blue Brain and Modha's neocortex simulations is specifically to refine and confirm a functional model. AI at the human level will princ.i.p.ally use the type of functional algorithmic model discussed in this book. However, molecular simulations will help us to perfect that model and to fully understand which details are important. In my development of speech recognition technology in the 1980s and 1990s, we were able to refine our algorithms once the actual transformations performed by the auditory nerve and early portions of the auditory cortex were understood. Even if our functional model was perfect, understanding exactly how it is actually implemented in our biological brains will reveal important knowledge about human function and dysfunction.

We will need detailed data on actual brains to create biologically based simulations. Markram's team is collecting its own data. There are large-scale projects to gather this type of data and make it generally available to scientists. For example, Cold Spring Harbor Laboratory in New York has collected 500 terabytes of data by scanning a mammal brain (a mouse), which they made available in June 2012. Their project allows a user to explore a brain similarly to the way Google Earth allows one to explore the surface of the planet. You can move around the entire brain and zoom in to see individual neurons and their connections. You can highlight a single connection and then follow its path through the brain.

Sixteen sections of the National Inst.i.tutes of Health have gotten together and sponsored a major initiative called the Human Connectome Project with $38.5 million of funding.7 Led by Was.h.i.+ngton University in St. Louis, the University of Minnesota, Harvard University, Ma.s.sachusetts General Hospital, and the University of California at Los Angeles, the project seeks to create a similar three-dimensional map of connections in the human brain. The project is using a variety of noninvasive scanning technologies, including new forms of MRI, magnetoencephalography (measuring the magnetic fields produced by the electrical activity in the brain), and diffusion tractography (a method to trace the pathways of fiber bundles in the brain). As I point out in Led by Was.h.i.+ngton University in St. Louis, the University of Minnesota, Harvard University, Ma.s.sachusetts General Hospital, and the University of California at Los Angeles, the project seeks to create a similar three-dimensional map of connections in the human brain. The project is using a variety of noninvasive scanning technologies, including new forms of MRI, magnetoencephalography (measuring the magnetic fields produced by the electrical activity in the brain), and diffusion tractography (a method to trace the pathways of fiber bundles in the brain). As I point out in chapter 10 chapter 10, the spatial resolution of noninvasive scanning of the brain is improving at an exponential rate. The research by Van J. Wedeen and his colleagues at Ma.s.sachusetts General Hospital showing a highly regular gridlike structure of the wiring of the neocortex that I described in chapter 4 chapter 4 is one early result from this project. is one early result from this project.

Oxford University computational neuroscientist Anders Sandberg (born in 1972) and Swedish philosopher Nick Bostrom (born in 1973) have written the comprehensive Whole Brain Emulation: A Roadmap Whole Brain Emulation: A Roadmap, which details the requirements for simulating the human brain (and other types of brains) at different levels of specificity from high-level functional models to simulating molecules.8 The report does not provide a timeline, but it does describe the requirements to simulate different types of brains at varying levels of precision in terms of brain scanning, modeling, storage, and computation. The report projects ongoing exponential gains in all of these areas of capability and argues that the requirements to simulate the human brain at a high level of detail are coming into place. The report does not provide a timeline, but it does describe the requirements to simulate different types of brains at varying levels of precision in terms of brain scanning, modeling, storage, and computation. The report projects ongoing exponential gains in all of these areas of capability and argues that the requirements to simulate the human brain at a high level of detail are coming into place.

An outline of the technological capabilities needed for whole brain emulation, in Whole Brain Emulation: A Roadmap Whole Brain Emulation: A Roadmap by Anders Sandberg and Nick Bostrom. by Anders Sandberg and Nick Bostrom.

An outline of Whole Brain Emulation: A Roadmap Whole Brain Emulation: A Roadmap by Anders Sandberg and Nick Bostrom. by Anders Sandberg and Nick Bostrom.

Neural Nets

In 1964, at the age of sixteen, I wrote to Frank Rosenblatt (19281971), a professor at Cornell University, inquiring about a machine called the Mark 1 Perceptron. He had created it four years earlier, and it was described as having brainlike properties. He invited me to visit him and try the machine out.

The Perceptron was built from what he claimed were electronic models of neurons. Input consisted of values arranged in two dimensions. For speech, one dimension represented frequency and the other time, so each value represented the intensity of a frequency at a given point in time. For images, each point was a pixel in a two-dimensional image. Each point of a given input was randomly connected to the inputs of the first layer of simulated neurons. Every connection had an a.s.sociated synaptic strength, which represented its importance, and which was initially set at a random value. Each neuron added up the signals coming into it. If the combined signal exceeded a particular threshold, the neuron fired and sent a signal to its output connection; if the combined input signal did not exceed the threshold, the neuron did not fire, and its output was zero. The output of each neuron was randomly connected to the inputs of the neurons in the next layer. The Mark 1 Perceptron had three layers, which could be organized in a variety of configurations. For example, one layer might feed back to an earlier one. At the top layer, the output of one or more neurons, also randomly selected, provided the answer. (For an algorithmic description of neural nets, see this endnote.)9

Since the neural net wiring and synaptic weights are initially set randomly, the answers of an untrained neural net are also random. The key to a neural net, therefore, is that it must learn its subject matter, just like the mammalian brains on which it's supposedly modeled. A neural net starts out ignorant; its teacher-which may be a human, a computer program, or perhaps another, more mature neural net that has already learned its lessons-rewards the student neural net when it generates the correct output and punishes it when it does not. This feedback is in turn used by the student neural net to adjust the strength of each interneuronal connection. Connections that are consistent with the correct answer are made stronger. Those that advocate a wrong answer are weakened.

Over time the neural net organizes itself to provide the correct answers without coaching. Experiments have shown that neural nets can learn their subject matter even with unreliable teachers. If the teacher is correct only 60 percent of the time, the student neural net will still learn its lessons with an accuracy approaching 100 percent.

However, limitations in the range of material that the Perceptron was capable of learning quickly became apparent. When I visited Professor Rosenblatt in 1964, I tried simple modifications to the input. The system was set up to recognize printed letters, and would recognize them quite accurately. It did a fairly good job of autoa.s.sociation (that is, it could recognize the letters even if I covered parts of them), but fared less well with invariance (that is, generalizing over size and font changes, which confused it).

During the last half of the 1960s, these neural nets became enormously popular, and the field of ”connectionism” took over at least half of the artificial intelligence field. The more traditional approach to AI, meanwhile, included direct attempts to program solutions to specific problems, such as how to recognize the invariant properties of printed letters.

Another person I visited in 1964 was Marvin Minsky (born in 1927), one of the founders of the artificial intelligence field. Despite having done some pioneering work on neural nets himself in the 1950s, he was concerned with the great surge of interest in this technique. Part of the allure of neural nets was that they supposedly did not require programming-they would learn solutions to problems on their own. In 1965 I entered MIT as a student with Professor Minsky as my mentor, and I shared his skepticism about the craze for ”connectionism.”

In 1969 Minsky and Seymour Papert (born in 1928), the two cofounders of the MIT Artificial Intelligence Laboratory, wrote a book called Perceptrons Perceptrons, which presented a single core theorem: specifically, that a Perceptron was inherently incapable of determining whether or not an image was connected. The book created a firestorm. Determining whether or not an image is connected is a task that humans can do very easily, and it is also a straightforward process to program a computer to make this discrimination. The fact that Perceptrons could not do so was considered by many to be a fatal flaw.

Two images from the cover of the book Perceptrons Perceptrons by Marvin Minsky and Seymour Papert. The top image is not connected (that is, the dark area consists of two disconnected parts). The bottom image is connected. A human can readily determine this, as can a simple software program. A feedforward Perceptron such as Frank Rosenblatt's Mark 1 Perceptron cannot make this determination. by Marvin Minsky and Seymour Papert. The top image is not connected (that is, the dark area consists of two disconnected parts). The bottom image is connected. A human can readily determine this, as can a simple software program. A feedforward Perceptron such as Frank Rosenblatt's Mark 1 Perceptron cannot make this determination.

Perceptrons, however, was widely interpreted to imply more than it actually did. Minsky and Papert's theorem applied only to a particular type of neural net called a feedforward neural net (a category that does include Rosenblatt's Perceptron); other types of neural nets did not have this limitation. Still, the book did manage to largely kill most funding for neural net research during the 1970s. The field did return in the 1980s with attempts to use what were claimed to be more realistic models of biological neurons and ones that avoided the limitations implied by the Minsky-Papert Perceptron theorem. Nevertheless, the ability of the neocortex to solve the invariance problem, a key to its strength, was a skill that remained elusive for the resurgent connectionist field.

Spa.r.s.e Coding: Vector Quantization

In the early 1980s I started a project devoted to another cla.s.sical pattern recognition problem: understanding human speech. At first, we used traditional AI approaches by directly programming expert knowledge about the fundamental units of speech-phonemes-and rules from linguists on how people string phonemes together to form words and phrases. Each phoneme has distinctive frequency patterns. For example, we knew that vowels such as ”e” and ”ah” are characterized by certain resonant frequencies called formants, with a characteristic ratio of formants for each phoneme. Sibilant sounds such as ”z” and ”s” are characterized by a burst of noise that spans many frequencies.

We captured speech as a waveform, which we then converted into multiple frequency bands (perceived as pitches) using a bank of frequency filters. The result of this transformation could be visualized and was called a spectrogram (see page 136 page 136).