background image

Elementary, My Dear Watson

The impressive demonstration of IBM’s Watson supercomputer competing against human players on Jeopardy is certain to stir a lively debate about the implications of this technology. Is Watson a breakthrough or a clever set of programming tricks that have inherent limitations? Will technology like Watson replace humans? And who will be liable if a human makes a bad decision based on technology like Watson? Even within the computer science and artificial intelligence (AI) community, Watson’s performance will provoke disagreements about its significance and its practical applications. However, such questions and issues often arise with the creation of new technologies, whether we are talking about healthcare, communication, transportation, energy, or other innovations that impact our daily lives. Such concerns rarely stop adoption.

It seems clear that Watson is not a substitute or replacement for human reasoning or actions, anymore than our desktop computers or smartphones, but like them simply a tool. Watson’s creators have not proposed it as a replacement for human thinking, but as an enhancement for human processing. Despite Watson’s winning Jeopardy against human competitors, the IBM team noted that the technology is a win for humans. By building better tools, we extend our abilities.

Without a doubt there are limitations to this type of technology. Some of those were obvious over the three days of Watson’s competitive participation on TV. However, Watson’s successful accomplishment stands as one of the best examples of where technology is headed in terms of our computers being able to deal with language based information. While we see this already with improvements to Internet search engines, Jeopardy represents an even greater challenge because the game requires more than parsing words and phrases and returning a list of related information. To play, Watson also had to interpret the many nuances found in the English language, including puns or subtle wordplay, and come up with a single correct response.

I have long had an interest in interfaces that enable users to naturally interact with computers. It started over 30 years ago, when I learned to program in BASIC, a computer language that is very English-like in its syntax. Arriving at Microsoft in 1981, I was excited about being assigned responsibility for managing what was then Microsoft’s flagship product line of BASIC interpreters and compilers. For me, BASIC unlocked the power of PCs instead of the cryptic assembly language understood only by programming wizards. In the early PC world with limited applications, BASIC was one of the simplest way to communicate with your computer.

Later, I moved over to Microsoft Windows where I explored the benefits of graphical presentation of visual  information and its point-and-click interface. However, despite the way that graphical user interfaces expanded the market for computers, I still wanted to go further in terms of defining natural user interfaces. So in the late 1990’s, I assembled a small team to create Microsoft Agent, a simple technology that enabled interactive virtual personalities as part of the user interface. I was convinced that the failure of Microsoft Bob (and its later successor, the Office Assistant) was not because it was a bad concept, but because the technology was too immature and limited in terms of how it could be and was applied.

To partially address this, we included support for speech technologies in an attempt to facilitate a conversational style of interaction. However, despite claims of greater than 90% recognition accuracy, it quickly became clear that conversational dialogues could not rely solely on existing speech recognition engines, since such technologies only focus only matching sounds of words. Such an interface would require a greater understanding of the context, including syntax and semantic meanings of words. So as a next step, we proposed to build a simple language processing framework where dialogue templates could be used to help structure a conversation. However, my efforts had to be discontinued over doubts about whether it was solvable problem.

I had been inspired in part with SHRDLU, a legendary language processing program said to enable users to manipulate objects in a virtual block world using typed in sentences. However, after consulting its creator, Dr. Terry Winograd’s (eminent researcher in AI and human-computer interaction and later advisor to Google’s founder Larry Paige) on the details of his approach, he indicated that this work was mostly a cleverly crafted demo, one that even he didn’t think could actually be successfully implemented. Similarly, Doug Lenat’s limited progress with creating a knowledge-based engine called Cyc, built on an extensive framework on relationships between words and concepts, like many other research projects seemed to also suggest that successful language or knowledge understanding technology was unattainable.

Even I had almost given up hope any progress was possible that until the announcement of Wolfram Alpha was introduced in 2009. Wolfram’s team had demonstrated the ability to use language to retrieve specific knowledge. Then in 2010, I discovered Siri, a clever language based search engine that was the offspring of the DARPA sponsored, Personal Assistant that Learns (PAL) project. Like Watson, these projects help demonstrate that despite the failures I had seen, there have been positive developments in advancing technologies that enable the use of language for accessing information from a computer.

So you may ask, what does this have to do with personal robots? First, it is important to recognize that one of the key missing features that is holding back the emergence of personal robots (of any complexity beyond the Sony Aibo or iRobot Roomba) is the lack of a successful user interface. One can view very impressive robot demonstrations today, but most imply that our interaction with personal robots will be facilitated by simply speaking with the robot. Unfortunately that is both a naive or misleading assumption for all the reasons already mentioned. Successful human-robot interaction will require advancing user interface technology beyond use of the keyboard and the mouse and simply employing speech recognition technologies as-is. Further, a wealth of research already indicates that regardless of function, we humans tend to regard robots as social actors and therefore expect personal robots to interact with us and our environment in a natural socially-oriented way. That means we will have to advance our thinking around interface technologies. Watson represents a step in the right direction.

That said, I don’t imagine Watson hosted on a personal robot in the immediate future considering that it requires a large battery of networked multicore PC servers. We are more likely to have robots with dexterous arms and hands before we will have the ability to have that much onboard processing. However, if one assumes that personal robots must have a wide connection to cloud services; it may be possible for Watson-like capabilities to be accessed through the robot’s remote connection.

Watson also demonstrated a very subtle attribute that has important implications on the successful human-robot interaction. Few seem to mention it, perhaps because it is what we expect or assume in language-based dialogue. Watson’s synthetic voice was not strictly monotone, as we might stereotype a computer or robot voice. Watson’s designers were clever to include a voice that sounded human in terms of its pitch and range, but also in terms of prosody, the melody of how we speak. This was even more apparent in the crafted dialogues that Watson had during its interviews, but it was subtly apparent in the game as well. This seemingly small detail is almost as important as the actual processing of information, especially when considered in terms of our interaction with robots, which I will address in more detail in a future blog.

My congratulations go out to the IBM team for their impressive accomplishment . Watson has not only raises the prospects for how we may interact with computers, but demonstrates the benefits of cloud computing, enabling us to access computing power beyond our desktops. As such Watson may not only help doctors better diagnose their patients, but also enable an aging Boomer generation to manage their own health and wellness. Not since IBM’s introduction of their PC in 1981, has the company stirred so much interest in their technology.