The Turing Tests of today are mistaken

How Goodhart's law holds back AI

Companies like OpenAI try to show that AIs are intelligent by hyping their high scores in behavioural tests – an approach with roots in the Turing Test. But there are hard limits to what we can infer about intelligence by observing behaviour. To demonstrate intelligence, argues Raphaël Millière, we must stop chasing high scores and start uncovering the mechanisms underlying AI systems’ behaviour.

 

Public discourse on artificial intelligence is divided by a widening chasm. While sceptics dismiss current AI systems as mere parlour tricks, devoid of genuine cognitive sophistication, evangelists and doomsayers view them as significant milestones on the path toward superhuman intelligence, laden with utopian potential or catastrophic risk.

This divide is symptomatic of a deeper methodological disagreement: there is no consensus among experts on how to adequately evaluate the capacities of AI systems. Researchers tend to rely on behavioural tests to evaluate AI systems’ capacities, but this methodology is flawed, and at best only partial fixes are available. In order to properly assess which capacities AI systems have, we must supplement improved behavioural tests with investigation of the causal mechanisms underlying AI behaviour.

Continue reading

Enjoy unlimited access to the world's leading thinkers.

Start by exploring our subscription options or joining our mailing list today.

Start Free Trial

Already a subscriber? Log in

Join the conversation