The computer program called Watson has been largely a failure in IBM’s plan to have it provide personalized advice to doctors treating patients with cancer.
What does it mean?
In 2011 the computer program Watson crushed the greatest human stars of the TV quiz show Jeopardy. Full disclosure: I am a big fan of Jeopardy. For those not in the know, the show involves three contestants who push a buzzer after a clue is read by the host; the first to buzz in must give the answer in the form of a question. Topics cover a wide range of history, science, art, literature, and popular culture, often including word play such as puns.
The Jeopardy rules were bent for Watson, with the episode being taped at the IBM Research Center, not at the usual TV studio. Certain categories of questions were omitted: audiovisual clues and clues that require explanation of how to interpret the clue. Also, the clues were transmitted to the computer in text, not orally. Speech recognition is still a tricky task for computers and this concession can be viewed as giving the computer a large advantage. Watson, like its human competitors, was not allowed to access the Internet during play.
Jeopardy players report that buzzer skills count at least as much as knowledge. Often all three players will know the correct reply, and the player who can buzz in quickly, perhaps anticipating the host’s cadence in reading the clue, will win the money. Human players are notified that they can buzz in by the appearance of a light, but Watson was notified by an electronic signal, again perhaps giving advantage to the machine. Watson was required to press a buzzer as the humans did, but Watson could, when highly confident “hit the buzzer in as little as 10 milliseconds, making it very hard for humans to beat,” as reported by the New York Times. The questions used in the match were not at high level for Jeopardy meaning that buzzer skills probably weighed heavily in the result.
About 20 researchers took three years to develop Watson. The components of Watson were designed for the specific Jeopardy task. The team identified types of Jeopardy questions and determined the language that would indicate the type of question (e.g. Factoid). Its knowledge base, compiled from Wikipedia, encyclopedias, and some databases of specific information was structured in various ways to aid quick retrieval. Wikipedia was prioritized as a source because analysis had shown that about 95% of Jeopardy answers are in the titles of Wikipedia pages. Watson had different components working in parallel to generate candidate answers which were then evaluated for confidence. The developers used the work of others (including some open-source programs) to develop these components. Other components decided whether to buzz in, which square to pick next, and the amount of wager to place on a Daily Double or in Final Jeopardy.
At the time of its Jeopardy win, Watson was touted by IBM as holding promise in more serious applications. The Guardian, for example, reported “IBM plans to use Watson’s linguistic and analytical abilities to develop products in areas such as medical diagnosis.” And the New York Times reported
“For I.B.M., the future will happen very quickly, company executives said. On Thursday it plans to announce that it will collaborate with Columbia University and the University of Maryland to create a physician’s assistant service that will allow doctors to query a cybernetic assistant. The company also plans to work with Nuance Communications Inc. to add voice recognition to the physician’s assistant, possibly making the service available in as little as 18 months.”
What does it mean for you?
Answering questions posed in natural language is a hard task for computers and the IBM researchers should be congratulated for their achievement. But AI has history of hype and many are skeptical of IBM’s purpose in creating Watson. Some argue that IBM, not doing well in its core business, used Watson as a marketing tool, not as serious science. IBM has a history of doing publicity catching projects, which it calls Grand Challenges, such as the chess machine that beat Garry Kasparov in chess in 1997. As an academic, I looked for, and could not find, a statement of the contribution to the advancement of the theory of Artificial Intelligence (AI) by the creation of Watson. IBM, of course, argued that their purpose was to advance application, especially in medicine, but the results have been disappointing.
Some of the information needed to decide on a medical diagnosis, such as lab results and measurements of vital signs, is easily used by a computer program, but much of the information is in unstructured notes from doctors. Watson has, some think, the potential to help with such problems, but has not been successful. An April 2019 article in IEEE Spectrum says that there have been no peer reviewed papers of consequence showing a contribution to medical care by Watson. The article also describes that IBM efforts to help with advice in oncology were stymied by Watson’s inability to extract the important information relevant to treatment from the vast array of literature.
Watson has been more widely used outside the US, but again perhaps based on marketing wins. “Many of these hospitals proudly use the IBM Watson brand in their marketing, telling patients that they’ll be getting AI-powered cancer care.” Actual results from those hospitals don’t seem to support a claim that the program offers a high level of care.
During the Jeopardy match, Watson failed in some laughable ways. For example, after an incorrect human response of “What are the ‘20s?” Watson buzzed in and offered “What is the 1920s?” The failure came from the fact that Watson was not programmed to listen to the previous answers. Again, computer programs are marvelous at the task they are programmed to do – and only at that task. But more puzzlingly Watson answered “What is Toronto?” in a Final Jeopardy category of U.S. Cities.
Some AI researchers will argue that progress is being made, and patience is required before these approaches demonstrate value. But climbing a tree is not the first step in sending a human to the moon. Does the ability of computer systems to perform on a quiz show mean that such programs are on the way to truly intelligent behavior? Perhaps the answer does not matter. In any useful application of advanced computing, the program is tailored to a specific task and only needs to be good at that task. A program that optimizes the routing of jobs in a factory doesn’t need to know how to tie its own shoes.
Before Watson competed on Jeopardy, IBM and the Jeopardy show had long negotiations leading to a version of Jeopardy tailored to Watson in many ways. AI may be able to eventually contribute to medical care, but only after the medical environment is changed to be more conducive to computer approaches. Electronic records are a step toward making a patient’s record more accessible for a computer, but a doctor’s notes, even while no longer hand-written, can still be ambiguous or confusing. Context matters and computers are very bad at understanding context. Just as many believe that self driving cars will only succeed in a carefully controlled driving environment, perhaps with no human driven vehicles on the same roads, the medical data collection system may need considerable change to become Watson friendly.
I think that the term “artificial intelligence” distracts us from the progress being made in using computers to aid human endeavor. Philosophers and engineers have spent decades arguing over whether computers exhibit intelligence. With every AI achievement, the goal posts are moved. Human experts have all fallen to computer programs in checkers, chess, and Go. With the win on Jeopardy, the cry becomes “When Watson wins “Dancing With The Stars” or even “The Amazing Race,” I’ll be impressed.”
The important fact is that computers can reliably deliver amazing results for a narrowly defined task. However, the results are only as good as the programmer’s foresight in anticipating all the situations that may arise even in that narrowly defined task. When programs fail, they can do so in ways that baffle the humans. Artificial stupidity seems amply demonstrated.
Where can you learn more?
“Jeopardy! as a Modern Turing Test: Did Watson Really Win?” explains the AI approaches used in creating Watson. The IBM Watson Research Team described the technical aspects of Watson in the 2010 Fall issue of AI Magazine.
My discussion of the failures in using Watson in medical care relies heavily on an April 2019 article from an IEEE Spectrum.
The strongest arguments against the current methods of artificial intelligence come from philosopher John Searle in the Chinese room thought experiment and from the Dreyfus brothers, the late Berkeley philosophy professor Hubert and Berkeley engineering professor Stuart (full disclosure, I studied with Stuart Dreyfus for my PhD in industrial engineering) in their various books, including Mind over Matter.