ORIGINAL: IEEE Spectrum
By Eliza Strickland
|Photo: John S. Lander/LightRocket/Getty Images Enter: Students,
human and robotic, must pass an exam to get through the gates of the
University of Tokyo.
“Passing the exam is not really an important research issue, but setting a concrete goal is useful,” says Noriko Arai, the team leader and a professor at NII. And by having the AI answer real questions from the exams, “we can compare the current state-of-the-art AI technology with 18-year-old students,” she says. The latest results show that her protégé is coming along well in subjects like history and reading comprehension.
The project began in 2011, when the director of NII challenged his professors to come up with a problem that was “stupendously big and stupendously difficult,” as Arai describes it, but could be easily understood by the general public. The University of Tokyo, known locally as Todai, has a legendarily difficult entrance exam, and the problem came to Arai in an elevator: “Could a robot get into the Todai?” she wondered. Thus the Todai Robot was born.
By 2016, the team hopes its AI will achieve a high score on the national standardized test, which includes multiple-choice questions in subjects such as physics and world history and requires students to solve math problems. But the machine-learning and natural-language-processing tools Arai’s team is developing for that test won’t prepare it for the Todai exam, which includes written essays. The team hopes the AI will pass the Todai exam by 2021, although they don’t yet know how it will accomplish that goal. “The generation of text from information has not been studied very much,” says NII associate professor Yusuke Miyao, another member of the team.
Even in the standardized test, each subject poses distinct challenges. The math portion, where an AI might be expected to excel, is made more complicated because the questions are presented as word problems, which the Todai Robot must translate into equations that it can solve. Physics is difficult too, because it presumes that the robot understands the rules of the universe. When equipped with a set of rules, however, the AI can simulate the scenario posed in a given question—for example, the trajectory of a missile—to arrive at the correct answer.
Surprisingly, the Todai Robot turned out to be a star student in history, where its natural-language-processing skills really shine. Miyao, who leads the work on language processing, explains that the AI can find the answers to questions by searching a database that includes textbooks and Wikipedia. But it still needs to understand semantic language and make the correct inferences. For example, the bot could try to determine whether the sentence “The janissaries were standing troops in the Ottoman empire” is true or false. It might find the information in a textbook that janissaries were musketeers in the sultan’s household, but it then has to determine the “semantic equivalence” of musketeers and troops, Miyao says.
The team has recently tested the robot at several competitions organized for language-processing conferences. One task was multiple choice, where the AIs had to determine which one of four sentences was true by determining which was semantically equivalent to another sentence provided. Such a task would be easy for a human, because the answer is essentially given. In that exercise, the Todai Robot performed relatively well, with 57 percent correct. “This is much better than the random baseline of 25 percent, but still much worse than average high school students,” says Miyao.
A harder task came closer to mimicking a real entrance exam: Once again, the AI had to determine which of four sentences was true, but this time it had to search for the answer in its own databases. There, the Todai Robot got only 31 percent correct. The AI also has work to do on reading comprehension tasks, in which it has to analyze a story and answer questions about its plot and characters. New results from another recent competition showed that the Todai Robot did well at identifying the characters, but it had trouble determining which sentences would provide the correct answer to a question.
It’s clear that the bot still has some cramming to do before it’s ready to matriculate, but AI experts are nonetheless impressed with the NII effort. “I think it’s quite an interesting project, and quite an ambitious project, because they’re tackling so many subject areas,” says IBM researcher Jennifer Chu-Carroll. She works on Watson, the AI that crushed the human competition in the game show “Jeopardy!” in 2011. If the NII researchers create a program that can apply its natural-language-processing tools to math problems and history questions alike, they will “advance the state of natural-language understanding,” says Chu-Carroll.