The Turing Tests of today are mistaken

Companies like OpenAI try to show that AIs are intelligent by hyping their high scores in behavioural tests – an approach with roots in the Turing Test. But there are hard limits to what we can infer about intelligence by observing behaviour. To demonstrate intelligence, argues Raphaël Millière, we must stop chasing high scores and start uncovering the mechanisms underlying AI systems’ behaviour.

Public discourse on artificial intelligence is divided by a widening chasm. While sceptics dismiss current AI systems as mere parlour tricks, devoid of genuine cognitive sophistication, evangelists and doomsayers view them as significant milestones on the path toward superhuman intelligence, laden with utopian potential or catastrophic risk.

This divide is symptomatic of a deeper methodological disagreement: there is no consensus among experts on how to adequately evaluate the capacities of AI systems. Researchers tend to rely on behavioural tests to evaluate AI systems’ capacities, but this methodology is flawed, and at best only partial fixes are available. In order to properly assess which capacities AI systems have, we must supplement improved behavioural tests with investigation of the causal mechanisms underlying AI behaviour.

The challenge of assessing AI systems’ intelligence invariably conjures the Turing Test. Alan Turing’s "imitation game" involves a human interrogator communicating by teleprinter with a computer and another human, and attempting to determine which is which based solely on their responses. The computer's objective is to cause the interrogator to incorrectly identify it as the human. Turing predicted that by 2000, computers would be able to play this game well enough that the average interrogator would have no better than a 70% chance of correctly identifying the machine after five minutes of questioning.

___

In theory, benchmarks should allow for rigorous and piecemeal evaluations of AI systems, helping foster broad consensus about their abilities.

___

In the age of large language models, like those that power OpenAI’s ChatGPT, the Turing Test may seem quaint. In fact, a recent paper found that GPT-4 fooled human interrogators in 41% of trials, exceeding Turing’s prediction for the twenty-first century – albeit with a two-decade delay. But even if future language models pass the test with flying colours, what should we make of this? It is doubtful that Turing himself intended his test to set strictly necessary and sufficient conditions for intelligence. As philosopher Ned Block emphasized, a system could in principle pass the test through brute force, answering every question by retrieving memorized answers stored in a giant look-up table. This suggests that the Turing Test provides at best defeasible evidence of intelligence.

AI research has largely moved on from the Turing Test as a holistic assessment of intelligence or cognition. It has not, however, moved on from behavioural evaluations as a whole. These days, AI systems like language models are routinely evaluated through benchmarks – standardized tests designed to assess specific capabilities, often by comparison with human baselines. Unlike the Turing Test, they provide a quantitative assessment of how AI systems perform on various tasks, facilitating a direct comparison of their abilities in a controlled and systematic way. They also avoid the one-size-fits-all approach to evaluation, allowing researchers to test different capacities separately.

APA	Millière, R. (2024, March 20). The Turing Tests of today are mistaken.IAI News. https://iai.tv/articles/the-turing-tests-of-today-are-mistaken-auid-2790
MLA	Millière, Raphaël. "The Turing Tests of today are mistaken." IAI News, 20 March 2024. https://iai.tv/articles/the-turing-tests-of-today-are-mistaken-auid-2790

news

How Goodhart's law holds back AI

Raphaël Millière

Related Posts:

The first great joke told by AI might be the last one humans hear

Water, not silicon, has to be the basis of true AI

The future of AI is analogue

The delusion behind upgrading humanity or ending it

Related Videos:

Nature: friend or foe?

The dawn of machine consciousness with Joscha Bach

The truth about quantum computing

The AI hoax

Continue reading

The worst prediction in the history of science

How neoliberalism broke economics with Abby Innes

Obama's chief economist: Neoliberalism is the only game in town

Join the conversation