Our Quest to Chat with HAL
By Dr. Larry Harris
HAL: "I am putting myself to the fullest possible use, which is all I think that any conscious entity can ever hope to do."
The promise of conversational interactions with machines -- like those with HAL in 2001: A Space Odyssey -- is still science fiction, but we have reached major milestones in recent years and are closer than ever before.
Conversational interaction with machines requires solutions in two major areas: voice recognition and natural language understanding. Voice recognition translates speech into text. Natural language understanding extracts the meaning of those words in that particular order. Although two separate areas of the broader discipline of artificial intelligence, both voice recognition and natural language understanding have undergone a similar progression. Success in early systems came from a more structured grammatical approach, whereas recent progress in both areas has come from a more statistical approach.
Some of the smartest minds at IBM, in addition to Ray Kurzweil and many others, began tackling voice recognition in the 70s and 80s, with some of the first products, such as IBM's Via Voice, introduced in 1997. Looking back over this period, two surprising facts jump out. First, Dragon Naturally Speaking has dominated this category for such a long time. Introduced on April 2, 1997, it is astonishing to see any single product or core technology lead a category such as this for so long. Second, that the husband-and-wife team of Drs. James and Janet Baker, the founders of Dragon, never got the credit or financial reward they deserved for such a great technical accomplishment.
Dragon was sold in 2000 to Lernout & Hauspie for stock in the company. The Baker's had a customary six-month lockup agreement, during which time Lernout & Hauspie became involved in a major financial scandal, causing the stock to plummet and forcing a fire sale of L&H assets. Dragon Naturally Speaking was eventually acquired by Nuance -- the leader in voice recognition technology we are familiar with today.
The first natural language-understanding applications were the "question and answer systems" of the 80s. SRI's first attempt in this area was a research system called Ladder. AICorp's Intellect was notable because it became the first software product from an outside vendor that was sold by IBM. More recently, IBM had spectacular success on its own when its Watson system famously beat reigning Jeopardy! champions in a televised match.
The proof of proper understanding of questions is, of course, providing the right answers. In modern commercial systems, such as EasyAsk (the company I work for), this is done by generating a SQL query that, when run against a relational database, will yield the right answer. For example, a question such as "What customers had orders in the last year but had no orders in the last quarter?" would result in a complex SQL query. The answer, of course, is of particular interest because these customers may need some attention from the sales force.
Apple's Siri product has shown that this database model can be extended to generating calls to smartphone apps that can take appropriate action. An example of this is asking your iPhone 4s "How far away is Cleveland?" Siri understands the meaning of your question and invokes the Maps app with specific parameters, causing it to not only the answer the question but provide directions as well. Similarly, events can be added to your calendar or to-do items added to your reminders.
These systems aren't perfect. They can fail to understand and respond to a request for a number of reasons: the voice recognition can't recognize the words, the natural language system can't understand the meaning or there is no target app to actually respond to the request. Now that there is greater widespread use of these systems, we can expect improvements in all three areas.
Given the statistical nature of the algorithms for both the voice recognition and language analysis, the large databases of actual questions being collected will be further grist for the statistical mill. The data extracted from the huge volume of voice requests today will feed the success of future natural language software deployments -- each individual interaction providing a bit more valuable phonetic and linguistic perspective that improves a machine's understanding.
When Apple opens its API for Siri, you can imagine that developers will work quickly to put Siri in front of their applications. The range of questions you can ask and actions you can take will grow exponentially as new apps are added or updated. This means that questions that cannot be answered today will be answerable tomorrow.
We can't predict all the things people will ask their smartphones and computers to do in the near future, let alone 40 years from now. For sure, the more people who use this technology, the more quickly it will evolve and the better it will become. It is hard to believe 2001: A Space Odyssey was filmed in 1968. Well over 40 years later, we still have a lot of work to do on the space flight front, but natural language processing is almost there.
HAL, meet Siri.
Dr. Larry Harris is an expert on database systems and computerized natural language. He founded EasyAsk, a natural language technology and solutions provider. Prior to EasyAsk, Dr. Harris founded and led Linguistic Technology Corporation and Artificial Intelligence Corporation (AICorp). Harris' early research involved a unique language analysis technique that became the foundation of Intellect, AICorp's mainframe natural language product, the first third-party software sold by IBM. Harris was also the chief architect of KBMS, an expert system tool.
Prior to founding AICorp, Harris was a professor of computer science at Dartmouth College and a visiting professor at MIT's AI Lab. Harris, who received a Ph.D. in computer science from Cornell University, is the author of AI Enters the Marketplace (Bantam Publishing, 1986). You can contact the author at firstname.lastname@example.org.