The Turing Tests of today are mistaken

How Goodhart's law holds back AI

Companies like OpenAI try to show that AIs are intelligent by hyping their high scores in behavioural tests – an approach with roots in the Turing Test. But there are hard limits to what we can infer about intelligence by observing behaviour. To demonstrate intelligence, argues Raphaël Millière, we must stop chasing high scores and start uncovering the mechanisms underlying AI systems’ behaviour.

 

Public discourse on artificial intelligence is divided by a widening chasm. While sceptics dismiss current AI systems as mere parlour tricks, devoid of genuine cognitive sophistication, evangelists and doomsayers view them as significant milestones on the path toward superhuman intelligence, laden with utopian potential or catastrophic risk.

This divide is symptomatic of a deeper methodological disagreement: there is no consensus among experts on how to adequately evaluate the capacities of AI systems. Researchers tend to rely on behavioural tests to evaluate AI systems’ capacities, but this methodology is flawed, and at best only partial fixes are available. In order to properly assess which capacities AI systems have, we must supplement improved behavioural tests with investigation of the causal mechanisms underlying AI behaviour.

The challenge of assessing AI systems’ intelligence invariably conjures the Turing Test. Alan Turing’s "imitation game" involves a human interrogator communicating by teleprinter with a computer and another human, and attempting to determine which is which based solely on their responses. The computer's objective is to cause the interrogator to incorrectly identify it as the human. Turing predicted that by 2000, computers would be able to play this game well enough that the average interrogator would have no better than a 70% chance of correctly identifying the machine after five minutes of questioning.

___

In theory, benchmarks should allow for rigorous and piecemeal evaluations of AI systems, helping foster broad consensus about their abilities.

Continue reading

Enjoy unlimited access to the world's leading thinkers.

Start by exploring our subscription options or joining our mailing list today.

Start Free Trial

Already a subscriber? Log in

Latest Releases
Join the conversation