The idea of an all-knowing computer program comes from science fiction and should stay there. Despite the seductive fluency of ChatGPT and other language models, they remain unsuitable as sources of knowledge. We must fight against the instinct to trust a human-sounding machine, argue Emily M. Bender & Chirag Shah.
Decades of science fiction have taught us that a key feature of a high-tech future is computer systems that give us instant access to seemingly limitless collections of knowledge through an interface that takes the form of a friendly (or sometimes sinisterly detached) voice. The early promise of the World Wide Web was that it might be the start of that collection of knowledge. With Meta’s Galactica, OpenAI’s ChatGPT and earlier this year LaMDA from Google, it seems like the friendly language interface is just around the corner, too.
However, we must not mistake a convenient plot device—a means to ensure that characters always have the information the writer needs them to have—for a roadmap to how technology could and should be created in the real world. In fact, large language models like Galactica, ChatGPT and LaMDA are not fit for purpose as information access systems, in two fundamental and independent ways.
First, what they are designed to do is to create coherent-seeming text. They do this by being cleverly built to take in vast quantities of training data and model the ways in which words co-occur across all of that text. The result is systems that can produce text that is very compelling when we as humans make sense of it. But the systems do not have any understanding of what they are producing, any communicative intent, any model of the world, or any ability to be accountable for the truth of what they are saying. This is why, in 2021, one of us (Bender) and her co-authors referred to them as stochastic parrots.
Information seeking is more than simply getting answers as quickly as possible
Second, the fantasy idea of an all-knowing computer rests on a fundamentally flawed notion of how knowledge works. There will never be an all-inclusive fully correct set of information that represents everything we could need to know. And even if you might hope that could come to pass, it should be very clear that today’s World Wide Web isn’t it. When people seek information, we might think we have a question and we are looking for the answer, but more often than not, we benefit more from engaging in sense-making: refining our question, looking at possible answers, understanding the sources those answers come from and what perspectives they represent, etc. Consider the difference between the queries: “What is 70 degrees Fahrenheit in Celcius?” and “Given current COVID conditions and my own risk factors, what precautions should I be taking?”
Information seeking is more than simply getting answers as quickly as possible. Sure, many of our questions call for simple, fact-based responses, but others require some investigation. For those situations, it is important that we get to see the relevant sources and the provenance of information. While this requires more effort on the user end, there are important cognitive and affective processes that happen during that process that allow us to better understand our own needs and the context, as well as provide a better assessment of the information being sought and collected before we use it. We wrote about these issues in our Situating Search paper.
It is urgent that we recognize that an overlay of apparent fluency does not, despite appearances, entail accuracy, informational value, or trustworthiness
ChatGPT and other conversational systems that provide direct answers to one’s questions have two fundamental issues in this regard. First, these systems are generating answers directly, which skips the step of showing the users sources from where one could look for answers. Second, these systems are providing responses in conversational natural language, something we otherwise only experience with other humans: Over both evolutionary time and every individual’s lived experience, natural language to-and-fro has always been with fellow human beings. As we encounter synthetic language output, it is very difficult not to extend trust in the same way as we would with a human. We argue that systems need to be very carefully designed so as not to abuse this trust.
Since the release of ChatGPT, we have seen widespread, breathless reports of what people have been able to use it to do and we are very concerned about how this technology is presented to the public. Even with non-conversational search engines, we know that is common to place undue trust in the results: if the search system places something at the top of the list, we tend to believe it is a good or true or representative result and if it doesn’t find something, it is tempting to believe it does not exist. But, as Safiya Noble warns us in Algorithms of Oppression, these platforms aren’t neutral reflections of either the world as it is or how people talk about the world, but rather shaped by various corporate interests. It is urgent that we as a public learn to conceptualize the workings of information access systems and, in this moment especially, that we recognize that an overlay of apparent fluency does not, despite appearances, entail accuracy, informational value, or trustworthiness.