Part 6 (2/2)
8. In act three of an 1846 Verdi opera, this Scourge of G.o.d is stabbed to death by his lover, Odabella.
-Examples of Jeopardy! Jeopardy! queries, all of which Watson got correct. Answers are: meringue harangue, pinafore, Grendel, gestate, May, skylark, shoe. For the eighth query, Watson replied, ”What is Attila?” The host responded by saying, ”Be more specific?” Watson clarified with, ”What is Attila the Hun?,” which is correct. queries, all of which Watson got correct. Answers are: meringue harangue, pinafore, Grendel, gestate, May, skylark, shoe. For the eighth query, Watson replied, ”What is Attila?” The host responded by saying, ”Be more specific?” Watson clarified with, ”What is Attila the Hun?,” which is correct.
The computer's techniques for unraveling Jeopardy! Jeopardy! clues sounded just like mine. That machine zeroes in on key words in a clue, then combs its memory (in Watson's case, a 15-terabyte data bank of human knowledge) for cl.u.s.ters of a.s.sociations with these words. It rigorously checks the top hits against all the contextual information it can muster: the category name; the kind of answer being sought; the time, place, and gender hinted at in the clue; and so on. And when it feels ”sure” enough, it decides to buzz. This is all an instant, intuitive process for a human clues sounded just like mine. That machine zeroes in on key words in a clue, then combs its memory (in Watson's case, a 15-terabyte data bank of human knowledge) for cl.u.s.ters of a.s.sociations with these words. It rigorously checks the top hits against all the contextual information it can muster: the category name; the kind of answer being sought; the time, place, and gender hinted at in the clue; and so on. And when it feels ”sure” enough, it decides to buzz. This is all an instant, intuitive process for a human Jeopardy! Jeopardy! player, but I felt convinced that under the hood my brain was doing more or less the same thing. player, but I felt convinced that under the hood my brain was doing more or less the same thing.-Ken Jennings, human Jeopardy! Jeopardy! champion who lost to Watson champion who lost to Watson I, for one, welcome our new robot overlords.-Ken Jennings, paraphrasing The Simpsons, The Simpsons, after losing to Watson after losing to Watson Oh my G.o.d. [Watson] is more intelligent than the average Jeopardy! Jeopardy! player in answering player in answering Jeopardy! Jeopardy! questions. That's impressively intelligent. questions. That's impressively intelligent.-Sebastian Thrun, former director of the Stanford AI Lab Watson understands nothing. It's a bigger steamroller.-Noam Chomsky
Artificial intelligence is all around us-we no longer have our hand on the plug. The simple act of connecting with someone via a text message, e-mail, or cell phone call uses intelligent algorithms to route the information. Almost every product we touch is originally designed in a collaboration between human and artificial intelligence and then built in automated factories. If all the AI systems decided to go on strike tomorrow, our civilization would be crippled: We couldn't get money from our bank, and indeed, our money would disappear; communication, transportation, and manufacturing would all grind to a halt. Fortunately, our intelligent machines are not yet intelligent enough to organize such a conspiracy.
What is new in AI today is the viscerally impressive nature of publicly available examples. For example, consider Google's self-driving cars (which as of this writing have gone over 200,000 miles in cities and towns), a technology that will lead to significantly fewer crashes, increased capacity of roads, alleviating the requirement of humans to perform the ch.o.r.e of driving, and many other benefits. Driverless cars are actually already legal to operate on public roads in Nevada with some restrictions, although widespread usage by the public throughout the world is not expected until late in this decade. Technology that intelligently watches the road and warns the driver of impending dangers is already being installed in cars. One such technology is based in part on the successful model of visual processing in the brain created by MIT's Tomaso Poggio. Called MobilEye, it was developed by Amnon Shashua, a former postdoctoral student of Poggio's. It is capable of alerting the driver to such dangers as an impending collision or a child running in front of the car and has recently been installed in cars by such manufacturers as Volvo and BMW.
I will focus in this section of the book on language technologies for several reasons. Not surprisingly, the hierarchical nature of language closely mirrors the hierarchical nature of our thinking. Spoken language was our first technology, with written language as the second. My own work in artificial intelligence, as this chapter has demonstrated, has been heavily focused on language. Finally, mastering language is a powerfully leveraged capability. Watson has already read hundreds of millions of pages on the Web and mastered the knowledge contained in these doc.u.ments. Ultimately machines will be able to master all of the knowledge on the Web-which is essentially all of the knowledge of our human-machine civilization.
English mathematician Alan Turing (19121954) based his eponymous test on the ability of a computer to converse in natural language using text messages.13 Turing felt that all of human intelligence was embodied and represented in language, and that no machine could pa.s.s a Turing test through simple language tricks. Although the Turing test is a game involving written language, Turing believed that the only way that a computer could pa.s.s it would be for it to actually possess the equivalent of human-level intelligence. Critics have proposed that a true test of human-level intelligence should include mastery of visual and auditory information as well. Turing felt that all of human intelligence was embodied and represented in language, and that no machine could pa.s.s a Turing test through simple language tricks. Although the Turing test is a game involving written language, Turing believed that the only way that a computer could pa.s.s it would be for it to actually possess the equivalent of human-level intelligence. Critics have proposed that a true test of human-level intelligence should include mastery of visual and auditory information as well.14 Since many of my own AI projects involve teaching computers to master such sensory information as human speech, letter shapes, and musical sounds, I would be expected to advocate the inclusion of these forms of information in a true test of intelligence. Yet I agree with Turing's original insight that the text-only version of the Turing test is sufficient. Adding visual or auditory input or output to the test would not actually make it more difficult to pa.s.s. Since many of my own AI projects involve teaching computers to master such sensory information as human speech, letter shapes, and musical sounds, I would be expected to advocate the inclusion of these forms of information in a true test of intelligence. Yet I agree with Turing's original insight that the text-only version of the Turing test is sufficient. Adding visual or auditory input or output to the test would not actually make it more difficult to pa.s.s.
One does not need to be an AI expert to be moved by the performance of Watson on Jeopardy! Jeopardy! Although I have a reasonable understanding of the methodology used in a number of its key subsystems, that does not diminish my emotional reaction to watching it- Although I have a reasonable understanding of the methodology used in a number of its key subsystems, that does not diminish my emotional reaction to watching it-him?-perform. Even a perfect understanding of how all of its component systems work-which no one actually has-would not help you to predict how Watson would actually react to a given situation. It contains hundreds of interacting subsystems, and each of these is considering millions of competing hypotheses at the same time, so predicting the outcome is impossible. Doing a thorough a.n.a.lysis-after the fact-of Watson's deliberations for a single three-second query would take a human centuries.
To continue my own history, in the late 1980s and 1990s we began working on natural-language understanding in limited domains. You could speak to one of our products, called Kurzweil Voice, about anything you wanted, so long as it had to do with editing doc.u.ments. (For example, ”Move the third paragraph on the previous page to here.”) It worked pretty well in this limited but useful domain. We also created systems with medical domain knowledge so that doctors could dictate patient reports. It had enough knowledge of fields such as radiology and pathology that it could question the doctor if something in the report seemed unclear, and would guide the physician through the reporting process. These medical reporting systems have evolved into a billion-dollar business at Nuance.
Understanding natural language, especially as an extension to automatic speech recognition, has now entered the mainstream. As of the writing of this book, Siri, the automated personal a.s.sistant on the iPhone 4S, has created a stir in the mobile computing world. You can pretty much ask Siri to do anything that a self-respecting smartphone should be capable of doing (for example, ”Where can I get some Indian food around here?” or ”Text my wife that I'm on my way,” or ”What do people think of the new Brad Pitt movie?”), and most of the time Siri will comply. Siri will entertain a small amount of nonproductive chatter. If you ask her what the meaning of life is, she will respond with ”42,” which fans of The Hitchhiker's Guide to the Galaxy The Hitchhiker's Guide to the Galaxy will recognize as its ”answer to the ultimate question of life, the universe, and everything.” Knowledge questions (including the one about the meaning of life) are answered by Wolfram Alpha, described on will recognize as its ”answer to the ultimate question of life, the universe, and everything.” Knowledge questions (including the one about the meaning of life) are answered by Wolfram Alpha, described on page 170 page 170. There is a whole world of ”chatbots” who do nothing but engage in small talk. If you would like to talk to our chatbot named Ramona, go to our Web site KurzweilAI.net and click on ”Chat with Ramona.”
Some people have complained to me about Siri's failure to answer certain requests, but I often recall that these are the same people who persistently complain about human service providers also. I sometimes suggest that we try it together, and often it works better than they expect. The complaints remind me of the story of the dog who plays chess. To an incredulous questioner, the dog's owner replies, ”Yeah, it's true, he does play chess, but his endgame is weak.” Effective compet.i.tors are now emerging, such as Google Voice Search.
That the general public is now having conversations in natural spoken language with their handheld computers marks a new era. It is typical that people dismiss the significance of a first-generation technology because of its limitations. A few years later, when the technology does work well, people still dismiss its importance because, well, it's no longer new. That being said, Siri works impressively for a first-generation product, and it is clear that this category of product is only going to get better.
Siri uses the HMM-based speech recognition technologies from Nuance. The natural-language extensions were first developed by the DARPA-funded ”CALO” project.15 Siri has been enhanced with Nuance's own natural-language technologies, and Nuance offers a very similar technology called Dragon Go! Siri has been enhanced with Nuance's own natural-language technologies, and Nuance offers a very similar technology called Dragon Go!16 The methods used for understanding natural language are very similar to hierarchical hidden Markov models, and indeed HHMM itself is commonly used. Whereas some of these systems are not specifically labeled as using HMM or HHMM, the mathematics is virtually identical. They all involve hierarchies of linear sequences where each element has a weight, connections that are self-adapting, and an overall system that self-organizes based on learning data. Usually the learning continues during actual use of the system. This approach matches the hierarchical structure of natural language-it is just a natural extension up the conceptual ladder from parts of speech to words to phrases to semantic structures. It would make sense to run a genetic algorithm on the parameters that control the precise learning algorithm of this cla.s.s of hierarchical learning systems and determine the optimal algorithmic details.
Over the past decade there has been a s.h.i.+ft in the way that these hierarchical structures are created. In 1984 Douglas Lenat (born in 1950) started the ambitious Cyc (for enCYClopedic) project, which aimed to create rules that would codify everyday ”commonsense” knowledge. The rules were organized in a huge hierarchy, and each rule involved-again-a linear sequence of states. For example, one Cyc rule might state that a dog has a face. Cyc can then link to general rules about the structure of faces: that a face has two eyes, a nose, and a mouth, and so on. We don't need to have one set of rules for a dog's face and then another for a cat's face, though we may of course want to put in additional rules for ways in which dogs' faces differ from cats' faces. The system also includes an inference engine: If we have rules that state that a c.o.c.ker spaniel is a dog, that dogs are animals, and that animals eat food, and if we were to ask the inference engine whether c.o.c.ker spaniels eat, the system would respond that yes, c.o.c.ker spaniels eat food. Over the next twenty years, and with thousands of person-years of effort, over a million such rules were written and tested. Interestingly, the language for writing Cyc rules-called CycL-is almost identical to LISP.
Meanwhile, an opposing school of thought believed that the best approach to natural-language understanding, and to creating intelligent systems in general, was through automated learning from exposure to a very large number of instances of the phenomena the system was trying to master. A powerful example of such a system is Google Translate, which can translate to and from fifty languages. That's 2,500 different translation directions, although for most language pairs, rather than translate language 1 directly into language 2, it will translate language 1 into English and then English into language 2. That reduces the number of translators Google needed to build to ninety-eight (plus a limited number of non-English pairs for which there is direct translation). The Google translators do not use grammatical rules; rather, they create vast databases for each language pair of common translations based on large ”Rosetta stone” corpora of translated doc.u.ments between two languages. For the six languages that const.i.tute the official languages of the United Nations, Google has used United Nations doc.u.ments, as they are published in all six languages. For less common languages, other sources have been used.
The results are often impressive. DARPA runs annual compet.i.tions for the best automated language translation systems for different language pairs, and Google Translate often wins for certain pairs, outperforming systems created directly by human linguists.
Over the past decade two major insights have deeply influenced the natural-language-understanding field. The first has to do with hierarchies. Although the Google approach started with a.s.sociation of flat word sequences from one language to another, the inherent hierarchical nature of language has inevitably crept into its operation. Systems that methodically incorporate hierarchical learning (such as hierarchical hidden Markov models) provided significantly better performance. However, such systems are not quite as automatic to build. Just as humans need to learn approximately one conceptual hierarchy at a time, the same is true for computerized systems, so the learning process needs to be carefully managed.
The other insight is that hand-built rules work well for a core of common basic knowledge. For translations of short pa.s.sages, this approach often provides more accurate results. For example, DARPA has rated rule-based Chinese-to-English translators higher than Google Translate for short pa.s.sages. For what is called the tail of a language, which refers to the millions of infrequent phrases and concepts used in it, the accuracy of rule-based systems approaches an unacceptably low asymptote. If we plot natural-language-understanding accuracy against the amount of training data a.n.a.lyzed, rule-based systems have higher performance initially but level off at fairly low accuracies of about 70 percent. In sharp contrast, statistical systems can reach the high 90s in accuracy but require a great deal of data to achieve that.
Often we need a combination of at least moderate performance on a small amount of training data and then the opportunity to achieve high accuracies with a more significant quant.i.ty. Achieving moderate performance quickly enables us to put a system in the field and then to automatically collect training data as people actually use it. In this way, a great deal of learning can occur at the same time that the system is being used, and its accuracy will improve. The statistical learning needs to be fully hierarchical to reflect the nature of language, which also reflects how the human brain works.
This is also how Siri and Dragon Go! work-using rules for the most common and reliable phenomena and then learning the ”tail” of the language in the hands of real users. When the Cyc team realized that they had reached a ceiling of performance based on hand-coded rules, they too adopted this approach. Hand-coded rules provide two essential functions. They offer adequate initial accuracy, so that a trial system can be placed into widespread usage, where it will improve automatically. Secondly, they provide a solid basis for the lower levels of the conceptual hierarchy so that the automated learning can begin to learn higher conceptual levels.
As mentioned above, Watson represents a particularly impressive example of the approach of combining hand-coded rules with hierarchical statistical learning. IBM combined a number of leading natural-language programs to create a system that could play the natural-language game of Jeopardy! Jeopardy! On February 1416, 2011, Watson competed with the two leading human players: Brad Rutter, who had won more money than anyone else on the quiz show, and Ken Jennings, who had previously held the On February 1416, 2011, Watson competed with the two leading human players: Brad Rutter, who had won more money than anyone else on the quiz show, and Ken Jennings, who had previously held the Jeopardy! Jeopardy! champions.h.i.+p for the record time of seventy-five days. champions.h.i.+p for the record time of seventy-five days.
By way of context, I had predicted in my first book, The Age of Intelligent Machines The Age of Intelligent Machines, written in the mid-1980s, that a computer would take the world chess champions.h.i.+p by 1998. I also predicted that when that happened, we would either downgrade our opinion of human intelligence, upgrade our opinion of machine intelligence, or downplay the importance of chess, and that if history was a guide, we would minimize chess. Both of these things happened in 1997. When IBM's chess supercomputer Deep Blue defeated the reigning human world chess champion, Garry Kasparov, we were immediately treated to arguments that it was to be expected that a computer would win at chess because computers are logic machines, and chess, after all, is a game of logic. Thus Deep Blue's victory was judged to be neither surprising nor significant. Many of its critics went on to argue that computers would never master the subtleties of human language, including metaphors, similes, puns, double entendres, and humor.
The accuracy of natural-language-understanding systems as a function of the amount of training data. The best approach is to combine rules for the ”core” of the language and a data-based approach for the ”tail” of the language.
That is at least one reason why Watson represents such a significant milestone: Jeopardy! Jeopardy! is precisely such a sophisticated and challenging language task. Typical is precisely such a sophisticated and challenging language task. Typical Jeopardy! Jeopardy! queries includes many of these vagaries of human language. What is perhaps not evident to many observers is that Watson not only had to master the language in the unexpected and convoluted queries, but for the most part its knowledge was not hand-coded. It obtained that knowledge by actually reading 200 million pages of natural-language doc.u.ments, including all of Wikipedia and other encyclopedias, comprising 4 trillion bytes of language-based knowledge. As readers of this book are well aware, Wikipedia is not written in LISP or CycL, but rather in natural sentences that have all of the ambiguities and intricacies inherent in language. Watson needed to consider all 4 trillion characters in its reference material when responding to a question. (I realize that queries includes many of these vagaries of human language. What is perhaps not evident to many observers is that Watson not only had to master the language in the unexpected and convoluted queries, but for the most part its knowledge was not hand-coded. It obtained that knowledge by actually reading 200 million pages of natural-language doc.u.ments, including all of Wikipedia and other encyclopedias, comprising 4 trillion bytes of language-based knowledge. As readers of this book are well aware, Wikipedia is not written in LISP or CycL, but rather in natural sentences that have all of the ambiguities and intricacies inherent in language. Watson needed to consider all 4 trillion characters in its reference material when responding to a question. (I realize that Jeopardy! Jeopardy! queries are answers in search of a question, but this is a technicality-they ultimately are really questions.) If Watson can understand and respond to questions based on 200 million pages-in three seconds!-there is nothing to stop similar systems from reading the other billions of doc.u.ments on the Web. Indeed, that effort is now under way. queries are answers in search of a question, but this is a technicality-they ultimately are really questions.) If Watson can understand and respond to questions based on 200 million pages-in three seconds!-there is nothing to stop similar systems from reading the other billions of doc.u.ments on the Web. Indeed, that effort is now under way.
When we were developing character and speech recognition systems and early natural-language-understanding systems in the 1970s through 1990s, we used a methodology of incorporating an ”expert manager.” We would develop multiple systems to do the same thing but would incorporate somewhat different approaches in each one. Some of the differences were subtle, such as variations in the parameters controlling the mathematics of the learning algorithm. Some variations were fundamental, such as including rule-based systems instead of hierarchical statistical learning systems. The expert manager was itself a software program that was programmed to learn the strengths and weaknesses of these different systems by examining their performance in real-world situations. It was based on the notion that these strengths were orthogonal; that is, one system would tend to be strong where another was weak. Indeed, the overall performance of the combined systems with the trained expert manager in charge was far better than any of the individual systems.
Watson works the same way. Using an architecture called UIMA (Unstructured Information Management Architecture), Watson deploys literally hundreds of different systems-many of the individual language components in Watson are the same ones that are used in publicly available natural-language-understanding systems-all of which are attempting to either directly come up with a response to the Jeopardy! Jeopardy! query or else at least provide some disambiguation of the query. UIMA is basically acting as the expert manager to intelligently combine the results of the independent systems. UIMA goes substantially beyond earlier systems, such as the one we developed in the predecessor company to Nuance, in that its individual systems can contribute to a result without necessarily coming up with a final answer. It is sufficient if a subsystem helps narrow down the solution. UIMA is also able to compute how much confidence it has in the final answer. The human brain does this also-we are probably very confident of our response when asked for our mother's first name, but we are less so in coming up with the name of someone we met casually a year ago. query or else at least provide some disambiguation of the query. UIMA is basically acting as the expert manager to intelligently combine the results of the independent systems. UIMA goes substantially beyond earlier systems, such as the one we developed in the predecessor company to Nuance, in that its individual systems can contribute to a result without necessarily coming up with a final answer. It is sufficient if a subsystem helps narrow down the solution. UIMA is also able to compute how much confidence it has in the final answer. The human brain does this also-we are probably very confident of our response when asked for our mother's first name, but we are less so in coming up with the name of someone we met casually a year ago.
Thus rather than come up with a single elegant approach to understanding the language problem inherent in Jeopardy! Jeopardy! the IBM scientists combined all of the state-of-the-art language-understanding modules they could get their hands on. Some use hierarchical hidden Markov models; some use mathematical variants of HHMM; others use rule-based approaches to code directly a core set of reliable rules. UIMA evaluates the performance of each system in actual use and combines them in an optimal way. There is some misunderstanding in the public discussions of Watson in that the IBM scientists who created it often focus on UIMA, which is the expert manager they created. This leads to comments by some observers that Watson has no real understanding of language because it is difficult to identify where this understanding resides. Although the UIMA framework also learns from its own experience, Watson's ”understanding” of language cannot be found in UIMA alone but rather is distributed across all of its many components, including the self-organizing language modules that use methods similar to HHMM. the IBM scientists combined all of the state-of-the-art language-understanding modules they could get their hands on. Some use hierarchical hidden Markov models; some use mathematical variants of HHMM; others use rule-based approaches to code directly a core set of reliable rules. UIMA evaluates the performance of each system in actual use and combines them in an optimal way. There is some misunderstanding in the public discussions of Watson in that the IBM scientists who created it often focus on UIMA, which is the expert manager they created. This leads to comments by some observers that Watson has no real understanding of language because it is difficult to identify where this understanding resides. Although the UIMA framework also learns from its own experience, Watson's ”understanding” of language cannot be found in UIMA alone but rather is distributed across all of its many components, including the self-organizing language modules that use methods similar to HHMM.
A separate part of Watson's technology uses UIMA's confidence estimate in its answers to determine how to place Jeopardy! Jeopardy! bets. While the Watson system is specifically optimized to play this particular game, its core language- and knowledge-searching technology can easily be adapted to other broad tasks. One might think that less commonly shared professional knowledge, such as that in the medical field, would be more difficult to master than the general-purpose ”common” knowledge that is required to play bets. While the Watson system is specifically optimized to play this particular game, its core language- and knowledge-searching technology can easily be adapted to other broad tasks. One might think that less commonly shared professional knowledge, such as that in the medical field, would be more difficult to master than the general-purpose ”common” knowledge that is required to play Jeopardy! Jeopardy! Actually, the opposite is the case: Professional knowledge tends to be more highly organized, structured, and less ambiguous than its commonsense counterpart, so it is highly amenable to accurate natural-language understanding using these techniques. As mentioned, IBM is currently working with Nuance to adapt the Watson technology to medicine. Actually, the opposite is the case: Professional knowledge tends to be more highly organized, structured, and less ambiguous than its commonsense counterpart, so it is highly amenable to accurate natural-language understanding using these techniques. As mentioned, IBM is currently working with Nuance to adapt the Watson technology to medicine.
The conversation that takes place when Watson is playing Jeopardy! Jeopardy! is a brief one: A question is posed, and Watson comes up with an answer. (Again, technically, it comes up with a question to respond to an answer.) It does not engage in a conversation that would require tracking all of the earlier statements of all partic.i.p.ants. (Siri actually does do this to a limited extent: If you ask it to send a message to your wife, it will ask you to identify her, but it will remember who she is for subsequent requests.) Tracking all of the information in a conversation-a task that would clearly be required to pa.s.s the Turing test-is a significant additional requirement but not fundamentally more difficult than what Watson is doing already. After all, Watson has read hundreds of millions of pages of material, which obviously includes many stories, so it is capable of tracking through complicated sequential events. It should therefore be able to follow its own conversations and take that into consideration in its subsequent replies. is a brief one: A question is posed, and Watson comes up with an answer. (Again, technically, it comes up with a question to respond to an answer.) It does not engage in a conversation that would require tracking all of the earlier statements of all partic.i.p.ants. (Siri actually does do this to a limited extent: If you ask it to send a message to your wife, it will ask you to identify her, but it will remember who she is for subsequent requests.) Tracking all of the information in a conversation-a task that would clearly be required to pa.s.s the Turing test-is a significant additional requirement but not fundamentally more difficult than what Watson is doing already. After all, Watson has read hundreds of millions of pages of material, which obviously includes many stories, so it is capable of tracking through complicated sequential events. It should therefore be able to follow its own conversations and take that into consideration in its subsequent replies.
Another limitation of the Jeopardy! Jeopardy! game is that the answers are generally brief: It does not, for example, pose questions of the sort that ask contestants to name the five primary themes of game is that the answers are generally brief: It does not, for example, pose questions of the sort that ask contestants to name the five primary themes of A Tale of Two Cities A Tale of Two Cities. To the extent that it can find doc.u.ments that do discuss the themes of this novel, a suitably modified version of Watson should be able to respond to this. Coming up with such themes on its own from just reading the book, and not essentially copying the thoughts (even without the words) of other thinkers, is another matter. Doing so would const.i.tute a higher-level task than Watson is capable of today-it is what I call a Turing testlevel task. (That being said, I will point out that most humans do not come up with their own original thoughts either but copy the ideas of their peers and opinion leaders.) At any rate, this is 2012, not 2029, so I would not expect Turing testlevel intelligence yet. On yet another hand, I would point out that evaluating the answers to questions such as finding key ideas in a novel is itself not a straightforward task. If someone is asked who signed the Declaration of Independence, one can determine whether or not her response is true or false. The validity of answers to higher-level questions such as describing the themes of a creative work is far less easily established.
It is noteworthy that although Watson's language skills are actually somewhat below that of an educated human, it was able to defeat the best two Jeopardy! Jeopardy! players in the world. It could accomplish this because it is able to combine its language ability and knowledge understanding with the perfect recall and highly accurate memories that machines possess. That is why we have already largely a.s.signed our personal, social, and historical memories to them. players in the world. It could accomplish this because it is able to combine its language ability and knowledge understanding with the perfect recall and highly accurate memories that machines possess. That is why we have already largely a.s.signed our personal, social, and historical memories to them.
Although I'm not prepared to move up my prediction of a computer pa.s.sing the Turing test by 2029, the progress that has been achieved in systems like Watson should give anyone substantial confidence that the advent of Turing-level AI is close at hand. If one were to create a version of Watson that was optimized for the Turing test, it would probably come pretty close.
American philosopher John Searle (born in 1932) argued recently that Watson is not capable of thinking. Citing his ”Chinese room” thought experiment (which I will discuss further in chapter 11 chapter 11), he states that Watson is only manipulating symbols and does not understand the meaning of those symbols. Actually, Searle is not describing Watson accurately, since its understanding of language is based on hierarchical statistical processes-not the manipulation of symbols. The only way that Searle's characterization would be accurate is if we considered every step in Watson's self-organizing processes to be ”the manipulation of symbols.” But if that were the case, then the human brain would not be judged capable of thinking either.
It is amusing and ironic when observers criticize Watson for just just doing statistical a.n.a.lysis of language as opposed to possessing the ”true” understanding of language that humans have. Hierarchical statistical a.n.a.lysis is exactly what the human brain is doing when it is resolving multiple hypotheses based on statistical inference (and indeed at every level of the neocortical hierarchy). Both Watson and the human brain learn and respond based on a similar approach to hierarchical understanding. In many respects Watson's knowledge is far more extensive than a human's; no human can claim to have mastered all of Wikipedia, which is only part of Watson's knowledge base. Conversely, a human can today master more conceptual levels than Watson, but that is certainly not a permanent gap. doing statistical a.n.a.lysis of language as opposed to possessing the ”true” understanding of language that humans have. Hierarchical statistical a.n.a.lysis is exactly what the human brain is doing when it is resolving multiple hypotheses based on statistical inference (and indeed at every level of the neocortical hierarchy). Both Watson and the human brain learn and respond based on a similar approach to hierarchical understanding. In many respects Watson's knowledge is far more extensive than a human's; no human can claim to have mastered all of Wikipedia, which is only part of Watson's knowledge base. Conversely, a human can today master more conceptual levels than Watson, but that is certainly not a permanent gap.
One important system that demonstrates the strength of computing applied to organized knowledge is Wolfram Alpha, an answer engine (as opposed to a search engine) developed by British mathematician and scientist Dr. Wolfram (born 1959) and his colleagues at Wolfram Research. For example, if you ask Wolfram Alpha (at WolframAlpha.com), ”How many primes are there under a million?” it will respond with ”78,498.” It did not look up the answer, it computed it, and following the answer it provides the equations it used. If you attempted to get that answer using a conventional search engine, it would direct you to links where you could find the algorithms required. You would then have to plug those formulas into a system such as Mathematica, also developed by Dr. Wolfram, but this would obviously require a lot more work (and understanding) than simply asking Alpha.
Indeed, Alpha consists of 15 million lines of Mathematica code. What Alpha is doing is literally computing the answer from approximately 10 trillion bytes of data that have been carefully curated by the Wolfram Research staff. You can ask a wide range of factual questions, such as ”What country has the highest GDP per person?” (Answer: Monaco, with $212,000 per person in U.S. dollars), or ”How old is Stephen Wolfram?” (Answer: 52 years, 9 months, 2 days as of the day I am writing this). As mentioned, Alpha is used as part of Apple's Siri; if you ask Siri a factual question, it is handed off to Alpha to handle. Alpha also handles some of the searches posed to Microsoft's Bing search engine.
In a recent blog post, Dr. Wolfram reported that Alpha is now providing successful responses 90 percent of the time.17 He also reports an exponential decrease in the failure rate, with a half-life of around eighteen months. It is an impressive system, and uses handcrafted methods and hand-checked data. It is a testament to why we created computers in the first place. As we discover and compile scientific and mathematical methods, computers are far better than unaided human intelligence in implementing them. Most of the known scientific methods have been encoded in Alpha, along with continually updated data on topics ranging from economics to physics. In a private conversation I had with Dr. Wolfram, he estimated that self-organizing methods such as those used in Watson typically achieve about an 80 percent accuracy when they are working well. Alpha, he pointed out, is achieving about a 90 percent accuracy. Of course, there is self-selection in both of these accuracy numbers in that users (such as myself) have learned what kinds of questions Alpha is good at, and a similar factor applies to the self-organizing methods. Eighty percent appears to be a reasonable estimate of how accurate Watson is on He also reports an exponential decrease in the failure rate, with a half-life of around eighteen months. It is an impressive system, and uses handcrafted methods and hand-checked data. It is a testament to why we created computers in the first place. As we discover and compile scientific and mathematical methods, computers are far better than unaided human intelligence in implementing them. Most of the known scientific methods have been encoded in Alpha, along with continually updated data on topics ranging from economics to physics. In a private conversation I had with Dr. Wolfram, he estimated that self-organizing methods such as those used in Watson typically achieve about an 80 perc
<script>