Ever since 1950, one of the most popular measuring sticks of artificial intelligence has been the Turing test — named after mathematician Alan Turing. The idea is that a program with some kind of artificial intelligence should be able to use text-based chatting to convince more than 30 percent of people that it's a human being. In June 2014, researchers claimed that a chatbot named Eugene Goostman did just that.
Nowadays, however, many experts are questioning whether the Turing test is really the best test. A computer tricking people into thinking that it's a 13-year-old is definitely an achievement — but it's not necessarily the ideal display of true, humanlike thought.
So what would be a better test for artificial intelligence? One front-runner is an exam that relies on common sense. Specifically the test is of something called Winograd schemas. Because Winograd schemas rely on cultural knowledge, they're super easy for people and difficult for computers.
How to test computers for common sense
The test would take the form of a multiple-choice quiz of reading comprehension. But the text itself would have some very specific features. It would consist of Winograd schemas: pairs of sentences whose intended meaning can be flipped by changing just one word. They generally involve unclear pronouns or possessives. A famous example comes from Stanford computer scientist Terry Winograd:
- "The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence?"
1) The city councilmen
2) The demonstrators
- "The city councilmen refused the demonstrators a permit because they advocated violence. Who advocated violence?"
1) The city councilmen
2) The demonstrators
Most human beings can easily answer these questions. We use our common sense to figure out what "they" is supposed to be referring to in each case. And that common sense basically involves a combination of extensive cultural background knowledge with analytical skills. (In the first question, we can deduce that the city councilmen feared violence. In the second, the demonstrators advocated violence.)
For computers, however, these questions can be quite difficult. From a grammatical standpoint, the "they" in the sentences is technically unclear. In both questions, "they" could be either the councilmen or the demonstrators.
A computer could have access to all of Google and still not really be able to grasp that city councilmen are probably less likely to advocate violence than demonstrators. It's simply less culturally appropriate for councilmen to do so. But you're not going to find that in the dictionary under "city councilmen."
Here's some more Winograd schemas, from a growing, open collection of more than 100:
- The trophy doesn't fit into the brown suitcase because it's too [small/large]. What is too [small/large]?
Answers: The suitcase/the trophy.
- Jane gave Joan candy because she [was/wasn't] hungry. Who [was/wasn't] hungry?
- The woman held the girl against her [chest/will]. Whose [chest/will]?
Answers: The woman's/the girl's
In 2011, University of Toronto computer scientist Hector Levesque proposed using a bunch of multiple-choice Winograd schemas as an alternative to the Turing test.
Levesque said that they should pick Winograd schemas that are simple for humans to solve. And they shouldn't be Google-hackable. Basically, the computer shouldn't be able to solve the question by only analyzing the statistical frequency of certain words appearing together in a large collection of English-language texts (aka the Internet).
So, for instance, Levesque gives the example of "The racecar zoomed by the school bus because it was going so [fast/slow]
Levesque has laid out several reasons why a Winograd schema test could be better than a Turing test. "A machine should be able to show us that it is thinking without having to pretend to be somebody," he writes. "Our WS challenge does not allow a subject to hide behind a smokescreen of verbal tricks, playfulness, or canned responses." And, unlike the Turing test, which is scored by a panel of human judges, he notes that grading a Winograd schema test is completely non-subjective.
Will computers ever pass this new Winograd schema test?
In the past, programmers have often tried to devise computers that could pass a Turing test. The Loebner Prize, for example, offers a top award of $100,000 for a chatbot that can convince judges it's human during a five-minute period involving both text and audio.
And now there's a new competition on the scene. The Winograd Schema Challenge will have its first annual competition in 2015 and will offer $25,000 for any computer program that can reach human levels of performance on a test of at least 40 such puzzles that the computer has never seen before. The competition is organized by computer-science nonprofit Commonsense Reasoning and funded by computer-software company Nuance Communications.
And how far away are computers from achieving this goal? Altaf Rahman and Vincent Ng, of the University of Texas at Dallas, used machine-learning techniques on 30 similar questions and got to an accuracy of 73 percent. Not bad. But any reasonably intelligent person should get 100 percent correct, so there's still a ways to go.
Alan Turing did many other (arguably more) important things in his life than come up with the Turing test, including building an early computer that he used to break encrypted German messages during World War II. That work is the focus of the movie The Imitation Game, starring Benedict Cumberbatch, which was released in theaters on November 28.