We tend to think of language, perception and thought as representing the world, pointing towards and mapping reality. But AI’s large language models suggest that this isn’t how language works, argues cognitive scientist Elan Barenholtz. These models prove that language and imagery can produce coherent continuations without ever grounding themselves in external reality. This “autogenerative” capacity was always present in language, awaiting discovery. And, Barenholtz argues, it is now the best explanation we have of how language and perception work in humans. Meaning arises not by establishing facts about the world, but rather through language’s generative role in producing further language, imagery, and, ultimately, coordinated human action.
Imagine that archaeologists unearth clay tablets from an ancient civilization, long lost to the world. There are no bilingual texts, no known descendants of the civilization, nothing to anchor a translation of the tablets. They seem to display nothing more than rows of arbitrary squiggles. Now imagine that someone claims to have decoded the squiggles. “These patterns,” they assure us, “are self-predicting. The sequence of symbols in one part of the tablet is mathematically sufficient to derive what will appear in another part.” And, indeed, they produce an algorithm which correctly predicts the text on the right side of each tablet based on the text on the left side.
The finding that the symbols contain this predictive structure would be an extraordinary insight. But we still might ask, “what do the symbols mean?” Now replace the tablets with the digital corpus we call the internet, and the algorithm with a large language model. The civilization is ours. And the question of meaning is ours too.
The technical and social impact of LLMs have been much discussed, but they also represent a major scientific and philosophical breakthrough. In trying to replicate human speech, engineers uncovered something fundamental about the nature of language. What these systems suggest is that natural language, which we use to communicate and think and write, is self-predicting, just like the symbols on the tablets. At the mechanical level, the process is simple. An LLM takes in linguistic subcomponents called tokens and generates the next one. That token is appended to the sequence and fed back in to produce the next, and so on. And out comes language.
To the model, words are arbitrary marks, no different from the patterns on those ancient tablets. The model has no exposure to the kinds of information we typically associate with meaning. It knows nothing about the redness of “red” or the spatial extension of “far.” It is blind, deaf, and unembodied. Instead, it knows where “red” tends to fall in relation to every other word: its proximity to “orange,” “blood,” “firetruck,” “angry,” “stop.” It encodes these relations by giving each word an address in a high-dimensional space, called an embedding. Each point is defined only by its position relative to all the others. There is no content at the address. No meaning stored inside. Pure relations. And yet, from this manipulation of empty symbols, an LLM can learn to talk.
___
We begin sentences without knowing how they will end. We commit to grammatical paths and backtrack when they fail. We are sometimes carried somewhere unexpected by the logic of what we were saying.
___
Join the conversation
John Bishop 1 3 June 2026
In his recent iai essay, cognitive scientist Elan Barenholtz argues that Large Language Models (LLMs) have revealed something radical about the nature of language itself: that it does not work by pointing to or describing reality, but by generating contextually appropriate continuations of itself. Barenholtz calls this the "autogenerative property of language".
Working from a very different intellectual tradition, Roy Harris, who served as Professor of General Linguistics at Oxford until his death, developed Integrationist Linguistics over several decades. Central to his project was a critique of the conventional conception of language as a referential code corresponding to an already constituted world. Though the two accounts converge in striking ways, important differences remain.
The most significant alignment between the two thinkers is their shared rejection of what Harris called the "language myth": the assumption, so deeply embedded it is rarely noticed, that words are fixed labels that transparently convey thoughts about an independent external world. Harris spent much of his career dismantling this referentialist, or "telementational" picture, which he traced back to the classical tradition and saw exemplified in the Saussurean model of language as a code shared among speakers. On the telementational view, communication succeeds when a thought in one mind is successfully replicated in another; language is merely the transparent conduit. Harris completely rejected this in The Language Myth, arguing that it fundamentally misrepresents what language is and what participants actually do with it.
Barenholtz arrives at a similar conclusion from a very different starting point. He observes that large language models operate without any direct exposure to the world - the proverbial "deaf, dumb and blind kid, who sure plays a mean pinball", to borrow a phrase from The Who - yet nevertheless generate coherent and contextually appropriate language. From this, he argues that grounding in external reality cannot be a fundamental prerequisite for language. As he puts it: "There is no content at the address. No meaning stored inside." What remains is a purely relational and statistical structure. "And yet, from this manipulation of empty symbols, an LLM can learn to talk."
Barenholtz is careful not to claim that the human brain literally is an LLM. Rather, he argues that the success of LLMs reveals a structural property of language itself. As he writes, now that we know this property is in language, it is difficult to avoid the conclusion that this is also how language operates in us. It is not that the brain is literally an LLM. Rather, the proposal is that it too may exploit the structure already present in language, generating words based on the predictive structure of previous words.
Harris would immediately recognise this move. For Integrationism, the sign does not carry a fixed, intrinsic meaning between speakers; meaning is constructed afresh in each communicative context. Barenholtz makes the structurally identical claim when he argues that a phrase like "there is a chair in the living room" has no unique continuation because continuation depends entirely on context. This is deeply Integrationist in spirit: context is not decoration added to a fixed meaning but constitutive of meaning itself.
A second major convergence lies in treating language as a form of doing rather than describing. Harris consistently insists that language is an activity, not a system of representations sitting apart from human action. Communication is prospective and social at its core. It is about enabling and constraining collective action, not transmitting private mental states.
Barenholtz's conclusion maps smoothly onto this: language as an autogenerative operating system, a mechanism for producing further language, imagery, and ultimately coordinated human action. Both thinkers dissolve the sharp boundary between linguistic meaning and practical activity. Hence, Barenholtz's concluding formulation, "Language doesn't mean; it does", becomes an anti-referentialist claim that Harris might even have endorsed.
Despite these similarities, the two accounts diverge in important ways. Harris was deeply sceptical of any account that treats language as a supra-individual, ahistorical structure with discoverable properties. Integrationism is radically situated: language is always the act of a lived person, here and now, with a body, a history, and a social context.
When Barenholtz argues that the autogenerative structure was already present in language awaiting discovery, Harris would object that what LLMs uncover is not language itself but structure in a corpus: a historically contingent record of prior linguistic activity. Language proper resides in situated, embodied, and continually emergent practices through which people coordinate action and make sense of their worlds.
This difference becomes most apparent in the question of embodiment. Integrationism insists on the fully embodied nature of the sign-maker. The body is not merely the vehicle through which intentions are executed; it is constitutive of the sign itself. Barenholtz acknowledges embodiment, but within his framework, it comes after language has generated the conditions that guide behaviour. From Harris's perspective, this reverses the proper order of explanation.
The sharpest difference concerns what counts as language in the first place. Harris's unit of analysis is always the individual creative act of integration; meaning is lived and made anew each time, not retrieved from a system. Barenholtz's framework instead proposes that the meaning of a statement lies in its capacity to generate contextually appropriate language, imagery, and perceptual confirmation.
Within Barenholtz's framework, these functions can be implemented computationally without presupposing any inner phenomenological perspective on the world. Harris would likely regard this as a category error. For him, one cannot reconstruct the living act of meaning-making from a statistical residue of past acts any more than one can reconstruct a conversation from a bare transcript.
Barenholtz illustrates the point with an imagined archaeological discovery: a lost language whose symbols exhibit predictive structure. Even if an algorithm could perfectly predict one portion of the text from another, we would still ask: what do the symbols mean?
What Barenholtz calls the autogenerative structure of language is, from an Integrationist perspective, the structure of an archive, not language itself. Whether a probabilistic account of prospective coordination adequately captures what Harris meant by joint integrative activity, or merely simulates its surface, remains the most productive unresolved question opened by this debate. As John Searle famously observed, the computational simulation of a rainstorm does not make anyone wet.
REFERENCES
Barenholtz, Elan (2026). "LLMs Show Language Does Not Describe Reality: The Meaning of Words Is Not Grounded in the World." iai News, 19 May 2026.
Harris, Roy (1981). The Language Myth. London: Duckworth.
Harris, Roy (1987). Reading Saussure. London: Duckworth.
Searle, J. R. (1980). "Minds, Brains, and Programs." Behavioral and Brain Sciences, 3(3), 417-457.
The Who (1969). "Pinball Wizard."
PHILLIP HART 23 May 2026
The autogenerative properties of language are really interesting to me, and I hadn't considered how they might impact the way we see the relationship between language and thought. But it seems to me that if we see the autogenerativity of language as an emergent property of an emergent system, then it seems that some form of constitutive grounding is still needed to account for how language comes to take on the emergent form that it does.
I think that to understand what this consitutive structure to language looks like, we need to understand language as fundamentally a human phenomena with it's structure created by human brains. Yes, the structure of language can in some sense be understood purely in terms of its structures (as an LLM does), but those structure didn't create themselves, rather they were created by human brains. If humans had never existed, language would never have existed, such that even if all humans died out and we only had LLMs processing and generating language, the thing that it's processing is not really language in it's own right, but rather a record of the linguistic systems created by humans.
To understand the structures of language, then, we need to understand how it is that human brains learn and create language. In an important sense, the structure of language is (re)created every time it is learnt by a human, because the act of learning a language is in creating the connections and patterns which constitute the structures of language in our brains. We can see this in how language change occurs: part of how language change occurs is because new learners of a language learn it slightly differently from those who already have the language and make slightly different/new structures (this process is seen most clearly in the creation of creole languages or examples like Nicaraguan sign language where the learning of a language goes hand in hand with formalising and creating its rules). Thus any account of the constitutive nature of language has to account for the ways in which human brains learn and (re)create language.
When we learn languages, we don't learn them through autogenerative tools, rather we learn through reference and the cognitive skills already available to us to create the structures. Crudely (and not necessarily chronologically) put, we hear our caregivers use certain words to refer to certain things (e.g. mummy and daddy), and then we start to learn and produce those words ourselves. Then we see patterns in how those words are used in relation to eachother: we realise that our caregivers often pair words like mummy with action words like go or run, and thus we learn word order. It is as we start to use and apply the structures that we have observed and created in our brains that the emergent auto-generative properties start to come out. But these autogenerative properties don't come first: they emerge out of the structures that we learn through observation of how the language is used to express a particular meaning, and this meaning is ultimately rooted in the world around us.
In a sense then, the "language" (auto)generated by LLMs is fundamentally very different from the language of humans, because by learning only the structures themselves, rather than learning the structures with reference to the meaning in context of the world around us, the language is almost ghostly or empty, a linguistic zombie. It's a testament to the wonder and complexity and creative, autogenerative potentiality of language that LLMs are able to produce referenceless but otherwise apparently meaningful linguistic output. But those structures, the emergent autogenerative properties don't come first, but are rooted in and created by human brains seeking to create meaning with reference to the world around us, and any account of language cannot ignore this.