Part 7 (1/2)
As an example, recall the metaphor I used in chapter 4 chapter 4 relating the random movements of molecules in a gas to the random movements of evolutionary change. Molecules in a gas move randomly with no apparent sense of direction. Despite this, virtually every molecule in a gas in a beaker, given sufficient time, will leave the beaker. I noted that this provides a perspective on an important question concerning the evolution of intelligence. Like molecules in a gas, evolutionary changes also move every which way with no apparent direction. Yet we nonetheless see a movement toward greater complexity and greater intelligence, indeed to evolution's supreme achievement of evolving a neocortex capable of hierarchical thinking. So we are able to gain an insight into how an apparently purposeless and directionless process can achieve an apparently purposeful result in one field (biological evolution) by looking at another field (thermodynamics). relating the random movements of molecules in a gas to the random movements of evolutionary change. Molecules in a gas move randomly with no apparent sense of direction. Despite this, virtually every molecule in a gas in a beaker, given sufficient time, will leave the beaker. I noted that this provides a perspective on an important question concerning the evolution of intelligence. Like molecules in a gas, evolutionary changes also move every which way with no apparent direction. Yet we nonetheless see a movement toward greater complexity and greater intelligence, indeed to evolution's supreme achievement of evolving a neocortex capable of hierarchical thinking. So we are able to gain an insight into how an apparently purposeless and directionless process can achieve an apparently purposeful result in one field (biological evolution) by looking at another field (thermodynamics).
I mentioned earlier how Charles Lyell's insight that minute changes to rock formations by streaming water could carve great valleys over time inspired Charles Darwin to make a similar observation about continual minute changes to the characteristics of organisms within a species. This metaphor search would be another continual background process.
We should provide a means of stepping through multiple lists simultaneously to provide the equivalent of structured thought. A list might be the statement of the constraints that a solution to a problem must satisfy. Each step can generate a recursive search through the existing hierarchy of ideas or a search through available literature. The human brain appears to be able to handle only four simultaneous lists at a time (without the aid of tools such as computers), but there is no reason for an artificial neocortex to have such a limitation.
We will also want to enhance our artificial brains with the kind of intelligence that computers have always excelled in, which is the ability to master vast databases accurately and implement known algorithms quickly and efficiently. Wolfram Alpha uniquely combines a great many known scientific methods and applies them to carefully collected data. This type of system is also going to continue to improve given Dr. Wolfram's observation of an exponential decline in error rates.
Finally, our new brain needs a purpose. A purpose is expressed as a series of goals. In the case of our biological brains, our goals are established by the pleasure and fear centers that we have inherited from the old brain. These primitive drives were initially set by biological evolution to foster the survival of species, but the neocortex has enabled us to sublimate them. Watson's goal was to respond to Jeopardy! Jeopardy! queries. Another simply stated goal could be to pa.s.s the Turing test. To do so, a digital brain would need a human narrative of its own fictional story so that it can pretend to be a biological human. It would also have to dumb itself down considerably, for any system that displayed the knowledge of, say, Watson would be quickly unmasked as nonbiological. queries. Another simply stated goal could be to pa.s.s the Turing test. To do so, a digital brain would need a human narrative of its own fictional story so that it can pretend to be a biological human. It would also have to dumb itself down considerably, for any system that displayed the knowledge of, say, Watson would be quickly unmasked as nonbiological.
More interestingly, we could give our new brain a more ambitious goal, such as contributing to a better world. A goal along these lines, of course, raises a lot of questions: Better for whom? Better in what way? For biological humans? For all conscious beings? If that is the case, who or what is conscious?
As nonbiological brains become as capable as biological ones of effecting changes in the world-indeed, ultimately far more capable than unenhanced biological ones-we will need to consider their moral education. A good place to start would be with one old idea from our religious traditions: the golden rule.
CHAPTER 8
THE MIND AS COMPUTER
Shaped a little like a loaf of French country bread, our brain is a crowded chemistry lab, bustling with nonstop neural conversations. Imagine the brain, that s.h.i.+ny mound of being, that mouse-gray parliament of cells, that dream factory, that pet.i.t tyrant inside a ball of bone, that huddle of neurons calling all the plays, that little everywhere, that fickle pleasuredome, that wrinkled wardrobe of selves stuffed into the skull like too many clothes into a gym bag.-Diane Ackerman Brains exist because the distribution of resources necessary for survival and the hazards that threaten survival vary in s.p.a.ce and time.-John M. Allman The modern geography of the brain has a deliciously antiquated feel to it-rather like a medieval map with the known world encircled by terra incognita where monsters roam.-David Bainbridge In mathematics you don't understand things. You just get used to them.-John von Neumann
E ver since the emergence of the computer in the middle of the twentieth century, there has been ongoing debate not only about the ultimate extent of its abilities but about whether the human brain itself could be considered a form of computer. As far as the latter question was concerned, the consensus has veered from viewing these two kinds of information-processing ent.i.ties as being essentially the same to their being fundamentally different. So ver since the emergence of the computer in the middle of the twentieth century, there has been ongoing debate not only about the ultimate extent of its abilities but about whether the human brain itself could be considered a form of computer. As far as the latter question was concerned, the consensus has veered from viewing these two kinds of information-processing ent.i.ties as being essentially the same to their being fundamentally different. So is is the brain a computer? the brain a computer?
When computers first became a popular topic in the 1940s, they were immediately regarded as thinking machines. The ENIAC, which was announced in 1946, was described in the press as a ”giant brain.” As computers became commercially available in the following decade, ads routinely referred to them as brains capable of feats that ordinary biological brains could not match.
A 1957 ad showing the popular conception of a computer as a giant brain.
Computer programs quickly enabled the machines to live up to this billing. The ”general problem solver,” created in 1959 by Herbert A. Simon, J. C. Shaw, and Allen Newell at Carnegie Mellon University, was able to devise a proof to a theorem that mathematicians Bertrand Russell (18721970) and Alfred North Whitehead (18611947) had been unable to solve in their famous 1913 work Principia Mathematica Principia Mathematica. What became apparent in the decades that followed was that computers could readily significantly exceed una.s.sisted human capability in such intellectual exercises as solving mathematical problems, diagnosing disease, and playing chess but had difficulty with controlling a robot tying shoelaces or with understanding the commonsense language that a five-year-old child could comprehend. Computers are only now starting to master these sorts of skills. Ironically, the evolution of computer intelligence has proceeded in the opposite direction of human maturation.
The issue of whether or not the computer and the human brain are at some level equivalent remains controversial today. In the introduction I mentioned that there were millions of links for quotations on the complexity of the human brain. Similarly, a Google inquiry for ”Quotations: the brain is not a computer” also returns millions of links. In my view, statements along these lines are akin to saying, ”Applesauce is not an apple.” Technically that statement is true, but you can make applesauce from an apple. Perhaps more to the point, it is like saying, ”Computers are not word processors.” It is true that a computer and a word processor exist at different conceptual levels, but a computer can become a word processor if it is running word processing software and not otherwise. Similarly, a computer can become a brain if it is running brain software. That is what researchers including myself are attempting to do.
The question, then, is whether or not we can find an algorithm that would turn a computer into an ent.i.ty that is equivalent to a human brain. A computer, after all, can run any algorithm that we might define because of its innate universality (subject only to its capacity). The human brain, on the other hand, is running a specific set of algorithms. Its methods are clever in that it allows for significant plasticity and the restructuring of its own connections based on its experience, but these functions can be emulated in software.
The universality of computation (the concept that a general-purpose computer can implement any algorithm)-and the power of this idea-emerged at the same time as the first actual machines. There are four key concepts that underlie the universality and feasibility of computation and its applicability to our thinking. They are worth reviewing here, because the brain itself makes use of them. The first is the ability to communicate, remember, and compute information reliably. Around 1940, if you used the word ”computer,” people a.s.sumed you were talking about an a.n.a.log computer, in which numbers were represented by different levels of voltage, and specialized components could perform arithmetic functions such as addition and multiplication. A big limitation of a.n.a.log computers, however, was that they were plagued by accuracy issues. Numbers could only be represented with an accuracy of about one part in a hundred, and as voltage levels representing them were processed by increasing numbers of arithmetic operators, errors would acc.u.mulate. If you wanted to perform more than a handful of computations, the results would become so inaccurate as to be meaningless.
Anyone who can remember the days of recording music with a.n.a.log tape machines will recall this effect. There was noticeable degradation on the first copy, as it was a little noisier than the original. (Remember that ”noise” represents random inaccuracies.) A copy of the copy was noisier still, and by the tenth generation the copy was almost entirely noise. It was a.s.sumed that the same problem would plague the emerging world of digital computers. We can understand such concerns if we consider the communication of digital information through a channel. No channel is perfect and each one will have some inherent error rate. Suppose we have a channel that has a .9 probability of correctly transmitting each bit. If I send a message that is one bit long, the probability of accurately transmitting it through that channel will be .9. Suppose I send two bits? Now the accuracy is .92 = .81. How about if I send one byte (eight bits)? I have less than an even chance (.43 to be exact) of sending it correctly. The probability of accurately sending five bytes is about 1 percent. = .81. How about if I send one byte (eight bits)? I have less than an even chance (.43 to be exact) of sending it correctly. The probability of accurately sending five bytes is about 1 percent.
An obvious solution to circ.u.mvent this problem is to make the channel more accurate. Suppose the channel makes only one error in a million bits. If I send a file consisting of a half million bytes (about the size of a modest program or database), the probability of correctly transmitting it is less than 2 percent, despite the very high inherent accuracy of the channel. Given that a single-bit error can completely invalidate a computer program and other forms of digital data, that is not a satisfactory situation. Regardless of the accuracy of the channel, since the likelihood of an error in a transmission grows rapidly with the size of the message, this would seem to be an intractable barrier.
a.n.a.log computers approached this problem through graceful degradation (meaning that users only presented problems in which they could tolerate small errors); however, if users of a.n.a.log computers limited themselves to a constrained set of calculations, the computers did prove somewhat useful. Digital computers, on the other hand, require continual communication, not just from one computer to another, but within the computer itself. There is communication from its memory to and from the central processing unit. Within the central processing unit, there is communication from one register to another and back and forth to the arithmetic unit, and so forth. Even within the arithmetic unit, there is communication from one bit register to another. Communication is pervasive at every level. If we consider that error rates escalate rapidly with increased communication and that a single-bit error can destroy the integrity of a process, digital computation was doomed-or so it seemed at the time.
Remarkably, that was the common view until American mathematician Claude Shannon (19162001) came along and demonstrated how we can create arbitrarily accurate communication using even the most unreliable communication channels. What Shannon stated in his landmark paper ”A Mathematical Theory of Communication,” published in the Bell System Technical Journal Bell System Technical Journal in July and October 1948, and in particular in his noisy channel-coding theorem, was that if you have available a channel with any error rate (except for exactly 50 percent per bit, which would mean that the channel was just transmitting pure noise), you are able to transmit a message in which the error rate is as accurate as you desire. In other words, the error rate of the transmission can be one bit out of in July and October 1948, and in particular in his noisy channel-coding theorem, was that if you have available a channel with any error rate (except for exactly 50 percent per bit, which would mean that the channel was just transmitting pure noise), you are able to transmit a message in which the error rate is as accurate as you desire. In other words, the error rate of the transmission can be one bit out of n n bits, where bits, where n n can be as large as you define. So, for example, in the extreme, if you have a channel that correctly transmits bits of information only 51 percent of the time (that is, it transmits the correct bit just slightly more often than the wrong bit), you can nonetheless transmit messages such that only one bit out of a million is incorrect, or one bit out of a trillion or a trillion trillion. can be as large as you define. So, for example, in the extreme, if you have a channel that correctly transmits bits of information only 51 percent of the time (that is, it transmits the correct bit just slightly more often than the wrong bit), you can nonetheless transmit messages such that only one bit out of a million is incorrect, or one bit out of a trillion or a trillion trillion.
How is this possible? The answer is through redundancy. That may seem obvious now, but it was not at the time. As a simple example, if I transmit each bit three times and take the majority vote, I will have substantially increased the reliability of the result. If that is not good enough, simply increase the redundancy until you get the reliability you need. Simply repeating information is the easiest way to achieve arbitrarily high accuracy rates from low-accuracy channels, but it is not the most efficient approach. Shannon's paper, which established the field of information theory, presented optimal methods of error detection and correction codes that can achieve any any target accuracy through target accuracy through any any nonrandom channel. nonrandom channel.
Older readers will recall telephone modems, which transmitted information through noisy a.n.a.log phone lines. These lines featured audibly obvious hisses and pops and many other forms of distortion, but nonetheless were able to transmit digital data with very high accuracy rates, thanks to Shannon's noisy channel theorem. The same issue and the same solution exist for digital memory. Ever wonder how CDs, DVDs, and program disks continue to provide reliable results even after the disk has been dropped on the floor and scratched? Again, we can thank Shannon.
Computation consists of three elements: communication-which, as I mentioned, is pervasive both within and between computers-memory, and logic gates (which perform the arithmetic and logical functions). The accuracy of logic gates can also be made arbitrarily high by similarly using error detection and correction codes. It is due to Shannon's theorem and theory that we can handle arbitrarily large and complex digital data and algorithms without the processes being disturbed or destroyed by errors. It is important to point out that the brain uses Shannon's principle as well, although the evolution of the human brain clearly predates Shannon's own! Most of the patterns or ideas (and an idea is also a pattern), as we have seen, are stored in the brain with a substantial amount of redundancy. A primary reason for the redundancy in the brain is the inherent unreliability of neural circuits.
The second important idea on which the information age relies is the one I mentioned earlier: the universality of computation. In 1936 Alan Turing described his ”Turing machine,” which was not an actual machine but another thought experiment. His theoretical computer consists of an infinitely long memory tape with a 1 or a 0 in each square. Input to the machine is presented on this tape, which the machine can read one square at a time. The machine also contains a table of rules-essentially a stored program-that consist of numbered states. Each rule specifies one action if the square currently being read is a 0, and a different action if the current square is a 1. Possible actions include writing a 0 or 1 on the tape, moving the tape one square to the right or left, or halting. Each state will then specify the number of the next state that the machine should be in.
The input to the Turing machine is presented on the tape. The program runs, and when the machine halts, it has completed its algorithm, and the output of the process is left on the tape. Note that even though the tape is theoretically infinite in length, any actual program that does not get into an infinite loop will use only a finite portion of the tape, so if we limit ourselves to a finite tape, the machine will still solve a useful set of problems.
If the Turing machine sounds simple, it is because that was its inventor's objective. Turing wanted his machine to be as simple as possible (but no simpler, to paraphrase Einstein). Turing and Alonzo Church (19031995), his former professor, went on to develop the Church-Turing thesis, which states that if a problem that can be presented to a Turing machine is not solvable by it, it is also not solvable by any any machine, following natural law. Even though the Turing machine has only a handful of commands and processes only one bit at a time, it can compute anything that any computer can compute. Another way to say this is that any machine that is ”Turing complete” (that is, that has equivalent capabilities to a Turing machine) can compute any algorithm (any procedure that we can define). machine, following natural law. Even though the Turing machine has only a handful of commands and processes only one bit at a time, it can compute anything that any computer can compute. Another way to say this is that any machine that is ”Turing complete” (that is, that has equivalent capabilities to a Turing machine) can compute any algorithm (any procedure that we can define).
A block diagram of a Turing machine with a head that reads and writes the tape and an internal program consisting of state transitions.
”Strong” interpretations of the Church-Turing thesis propose an essential equivalence between what a human can think or know and what is computable by a machine. The basic idea is that the human brain is likewise subject to natural law, and thus its information-processing ability cannot exceed that of a machine (and therefore of a Turing machine).
We can properly credit Turing with establis.h.i.+ng the theoretical foundation of computation with his 1936 paper, but it is important to note that he was deeply influenced by a lecture that Hungarian American mathematician John von Neumann (19031957) gave in Cambridge in 1935 on his stored program concept, a concept enshrined in the Turing machine.1 In turn, von Neumann was influenced by Turing's 1936 paper, which elegantly laid out the principles of computation, and made it required reading for his colleagues in the late 1930s and early 1940s. In turn, von Neumann was influenced by Turing's 1936 paper, which elegantly laid out the principles of computation, and made it required reading for his colleagues in the late 1930s and early 1940s.2 In the same paper Turing reports another unexpected discovery: that of unsolvable problems. These are problems that are well defined with unique answers that can be shown to exist, but that we can also prove can never be computed by any Turing machine-that is to say, by any any machine, a reversal of what had been a nineteenth-century dogma that problems that could be defined would ultimately be solved. Turing showed that there are as many unsolvable problems as solvable ones. Austrian American mathematician and philosopher Kurt G.o.del reached a similar conclusion in his 1931 ”incompleteness theorem.” We are thus left with the perplexing situation of being able to define a problem, to prove that a unique answer exists, and yet know that the answer can never be found. machine, a reversal of what had been a nineteenth-century dogma that problems that could be defined would ultimately be solved. Turing showed that there are as many unsolvable problems as solvable ones. Austrian American mathematician and philosopher Kurt G.o.del reached a similar conclusion in his 1931 ”incompleteness theorem.” We are thus left with the perplexing situation of being able to define a problem, to prove that a unique answer exists, and yet know that the answer can never be found.
Turing had shown that at its essence, computation is based on a very simple mechanism. Because the Turing machine (and therefore any computer) is capable of basing its future course of action on results it has already computed, it is capable of making decisions and modeling arbitrarily complex hierarchies of information.
In 1939 Turing designed an electronic calculator called Bombe that helped decode messages that had been encrypted by the n.a.z.i Enigma coding machine. By 1943, an engineering team influenced by Turing completed what is arguably the first computer, the Colossus, that enabled the Allies to continue decoding messages from more sophisticated versions of Enigma. The Bombe and Colossus were designed for a single task and could not be reprogrammed for a different one. But they performed this task brilliantly and are credited with having enabled the Allies to overcome the three-to-one advantage that the German Luftwaffe enjoyed over the British Royal Air Force and win the crucial Battle of Britain, as well as to continue antic.i.p.ating n.a.z.i tactics throughout the war.
It was on these foundations that John von Neumann created the architecture of the modern computer, which represents our third major idea. Called the von Neumann machine, it has remained the core structure of essentially every computer for the past sixty-seven years, from the microcontroller in your was.h.i.+ng machine to the largest supercomputers. In a paper dated June 30, 1945, and t.i.tled ”First Draft of a Report on the EDVAC,” von Neumann presented the ideas that have dominated computation ever since.3 The von Neumann model includes a central processing unit, where arithmetical and logical operations are carried out; a memory unit, where the program and data are stored; ma.s.s storage; a program counter; and input/output channels. Although this paper was intended as an internal project doc.u.ment, it has become the bible for computer designers. You never know when a seemingly routine internal memo will end up revolutionizing the world. The von Neumann model includes a central processing unit, where arithmetical and logical operations are carried out; a memory unit, where the program and data are stored; ma.s.s storage; a program counter; and input/output channels. Although this paper was intended as an internal project doc.u.ment, it has become the bible for computer designers. You never know when a seemingly routine internal memo will end up revolutionizing the world.
The Turing machine was not designed to be practical. Turing's theorems were concerned not with the efficiency of solving problems but rather in examining the range of problems that could in theory be solved by computation. Von Neumann's goal, on the other hand, was to create a feasible concept of a computational machine. His model replaces Turing's one-bit computations with multiple-bit words (generally some multiple of eight bits). Turing's memory tape is sequential, so Turing machine programs spend an inordinate amount of time moving the tape back and forth to store and retrieve intermediate results. In contrast, von Neumann's memory is random access, so that any data item can be immediately retrieved.
One of von Neumann's key ideas is the stored program, which he had introduced a decade earlier: placing the program in the same type of random access memory as the data (and often in the same block of memory). This allows the computer to be reprogrammed for different tasks as well as for self-modifying code (if the program store is writable), which enables a powerful form of recursion. Up until that time, virtually all computers, including the Colossus, were built for a specific task. The stored program makes it possible for a computer to be truly universal, thereby fulfilling Turing's vision of the universality of computation.
Another key aspect of the von Neumann machine is that each instruction includes an operation code specifying the arithmetic or logical operation to be performed and the address of an operand from memory.
Von Neumann's concept of how a computer should be architected was introduced with his publication of the design of the EDVAC, a project he conducted with collaborators J. Presper Eckert and John Mauchly. The EDVAC itself did not actually run until 1951, by which time there were other stored-program computers, such as the Manchester Small-Scale Experimental Machine, ENIAC, EDSAC, and BINAC, all of which had been deeply influenced by von Neumann's paper and involved Eckert and Mauchly as designers. Von Neumann was a direct contributor to the design of a number of these machines, including a later version of ENIAC, which supported a stored program.
There were a few precursors to von Neumann's architecture, although with one surprising exception, none are true von Neumann machines. In 1944 Howard Aiken introduced the Mark I, which had an element of programmability but did not use a stored program. It read instructions from a punched paper tape and then executed each command immediately. It also lacked a conditional branch instruction.
In 1941 German scientist Konrad Zuse (19101995) created the Z-3 computer. It also read its program from a tape (in this case, coded on film) and also had no conditional branch instruction. Interestingly, Zuse had support from the German Aircraft Research Inst.i.tute, which used the device to study wing flutter, but his proposal to the n.a.z.i government for funding to replace his relays with vacuum tubes was turned down. The n.a.z.is deemed computation as ”not war important.” That perspective goes a long way, in my view, toward explaining the outcome of the war.
There is actually one genuine forerunner to von Neumann's concept, and it comes from a full century earlier! English mathematician and inventor Charles Babbage's (17911871) a.n.a.lytical Engine, which he first described in 1837, did incorporate von Neumann's ideas and featured a stored program via punched cards borrowed from the Jacquard loom.4 Its random access memory included 1,000 words of 50 decimal digits each (the equivalent of about 21 kilobytes). Each instruction included an op code and an operand number, just like modern machine languages. It did include conditional branching and looping, so it was a true von Neumann machine. It was based entirely on mechanical gears and it appears that the a.n.a.lytical Engine was beyond Babbage's design and organizational skills. He built parts of it but it never ran. It is unclear whether the twentieth-century pioneers of the computer, including von Neumann, were aware of Babbage's work. Its random access memory included 1,000 words of 50 decimal digits each (the equivalent of about 21 kilobytes). Each instruction included an op code and an operand number, just like modern machine languages. It did include conditional branching and looping, so it was a true von Neumann machine. It was based entirely on mechanical gears and it appears that the a.n.a.lytical Engine was beyond Babbage's design and organizational skills. He built parts of it but it never ran. It is unclear whether the twentieth-century pioneers of the computer, including von Neumann, were aware of Babbage's work.
Babbage's computer did result in the creation of the field of software programming. English writer Ada Byron (18151852), Countess of Lovelace and the only legitimate child of the poet Lord Byron, was the world's first computer programmer. She wrote programs for the a.n.a.lytical Engine, which she needed to debug in her own mind (since the computer never worked), a practice well known to software engineers today as ”table checking.” She translated an article by the Italian mathematician Luigi Menabrea on the a.n.a.lytical Engine and added extensive notes of her own, writing that ”the a.n.a.lytical Engine weaves algebraic patterns, just as the Jacquard loom weaves flowers and leaves.” She went on to provide perhaps the first speculations on the feasibility of artificial intelligence, but concluded that the a.n.a.lytical Engine has ”no pretensions whatever to originate anything.”