We tend to think of language, perception and thought as representing the world, pointing towards and mapping reality. But AI’s large language models suggest that this isn’t how language works, argues cognitive scientist Elan Barenholtz. These models prove that language and imagery can produce coherent continuations without ever grounding themselves in external reality. This “autogenerative” capacity was always present in language, awaiting discovery. And, Barenholtz argues, it is now the best explanation we have of how language and perception work in humans. Meaning arises not by establishing facts about the world, but rather through language’s generative role in producing further language, imagery, and, ultimately, coordinated human action.
Imagine that archaeologists unearth clay tablets from an ancient civilization, long lost to the world. There are no bilingual texts, no known descendants of the civilization, nothing to anchor a translation of the tablets. They seem to display nothing more than rows of arbitrary squiggles. Now imagine that someone claims to have decoded the squiggles. “These patterns,” they assure us, “are self-predicting. The sequence of symbols in one part of the tablet is mathematically sufficient to derive what will appear in another part.” And, indeed, they produce an algorithm which correctly predicts the text on the right side of each tablet based on the text on the left side.
The finding that the symbols contain this predictive structure would be an extraordinary insight. But we still might ask, “what do the symbols mean?” Now replace the tablets with the digital corpus we call the internet, and the algorithm with a large language model. The civilization is ours. And the question of meaning is ours too.
The technical and social impact of LLMs have been much discussed, but they also represent a major scientific and philosophical breakthrough. In trying to replicate human speech, engineers uncovered something fundamental about the nature of language. What these systems suggest is that natural language, which we use to communicate and think and write, is self-predicting, just like the symbols on the tablets. At the mechanical level, the process is simple. An LLM takes in linguistic subcomponents called tokens and generates the next one. That token is appended to the sequence and fed back in to produce the next, and so on. And out comes language.
To the model, words are arbitrary marks, no different from the patterns on those ancient tablets. The model has no exposure to the kinds of information we typically associate with meaning. It knows nothing about the redness of “red” or the spatial extension of “far.” It is blind, deaf, and unembodied. Instead, it knows where “red” tends to fall in relation to every other word: its proximity to “orange,” “blood,” “firetruck,” “angry,” “stop.” It encodes these relations by giving each word an address in a high-dimensional space, called an embedding. Each point is defined only by its position relative to all the others. There is no content at the address. No meaning stored inside. Pure relations. And yet, from this manipulation of empty symbols, an LLM can learn to talk.
___
We begin sentences without knowing how they will end. We commit to grammatical paths and backtrack when they fail. We are sometimes carried somewhere unexpected by the logic of what we were saying.
___
Join the conversation