What AI Say Doesn’t Mean Anything At All

Emily Bender has claimed that large language models (LLMs) are “stochastic parrots” in that they repeat their training data with statistical variations rather than use language like humans do. LLMs are like parrots in that neither produce meaningful utterances that refer to objects in the world. Bender writes that meaning requires communicative intent—a relationship between language and the world outside of language. Form is not sufficient for meaning, where form is observable instances of language, including marks on a page, articulated sound, or pixels on a screen. Just as infants do not learn language from form alone, but from interactions with caregivers and others, we should not expect that LLMs have learned language through form alone.

I agree with Bender’s characterization of LLMs. LLM utterances have meaning only when we interpret them as such—when we give them meaning. What worries me is that the issue as described in Bender’s academic work and discussion about it seems to turn on a dispute about what meaning is. If we side with Bender’s opponents, like Christopher Manning at Stanford, and understand the meaning of a word to be “simply a description of the contexts in which it appears” (as quoted by Elizabeth Weil), LLM utterances have rather a lot of meaning. LLMs receive ample context through their training data, and they’re able to describe that context. Bender points at the importance of interaction with caregivers when infants learn language; Manning tells us that infants learn in a self-supervised way, like LLMs. Bender focuses on the embodied context of language; Manning says the information encoded in gestures and facial expressions is “marginal.”

When I read Bender’s “octopus test” example (5188-9), even I wasn’t convinced her argument was decisive. It seemed that her octopus could pass the Turing test she describes if it had more information about the problem its human interlocutor was trying to solve (in this case, about how sticks can be used to fend off a predator). Which is where Manning and others in his camp are coming from: LLMs have so much information about every topic now and they’ve shown such tremendous success in drawing upon that information to produce seemingly-meaningful responses, that it behooves us to rethink what meaning is.

This kind of debate, where the answer seems to turn on one’s preconceptions and which evidence one points to, is common with many dicey philosophical problems. But there’s a fundamental feature of meaning that’s either being overlooked or clarified insufficiently well—that’s more basic than how much information extra-linguistic forms like facial expressions and gestures convey. In order for an utterance to have meaning, its speaker must have beliefs about what other people believe, and LLMs don’t have a theory of mind.

My reasoning is related to a claim that Bender makes in the paper linked above: “the process of acquiring a linguistic system, like human communication generally, relies on joint attention and intersubjectivity: the ability to be aware of what another human is attending to and guess what they are intending to communicate.” But work needs to be done to clarify what this means and why joint attention and intersubjectivity are so important.

This is where Donald Davidson’s idea of radical interpretation is helpful. He asks us to consider what’s required to learn a language from scratch. Suppose a learner is dropped in a completely novel linguistic situation—say, the Amazonian rainforest—with no prior knowledge of the vernacular. He encounters a tribesman, who points at a jaguar and says “gavagai.” What does “gavagai” mean? We might assume—and the learner might assume—“gavagai” means jaguar. But “gavagai” could mean all sorts of things: the color of the jaguar, its quantity, shape, size, that it belongs to the tribesman, a cause of the jaguar, something the jaguar is part of, and so on. There’s no instruction manual for what kind of thing the tribesman is pointing at (If there was, that too would have to be interpreted!). Interpretation is indeterminate, to use Davidson’s term. Wittgenstein makes the same point in the early sections of his Philosophical Investigations where he discusses grammatical kinds: pointing only has meaning when we understand what kind of thing we are pointing at.

So how do we break into a language, how do we build up meaning at all, if we have no way of getting to the “right” interpretation? We have to have some way of believing that the tribesman is assenting to propositions, that he has beliefs. Maybe his facial expressions clue us in or his tone of voice (Is he excited? In pain?). Maybe he isn’t pointing; we rely upon the orientation of his body or movements of his eyes. Maybe the jaguar is hot pink, so we assume he means the jaguar’s color. Maybe the jaguar is nuzzling his leg so we assume he means a relationship he has to the jaguar, e.g. that the jaguar is his pet.

Somehow, we make an assumption about what the tribesman believes. Maybe that’s, “There is a jaguar.” We ascribe that belief to him and so come to believe that “gavagai” means jaguar. Or if the jaguar is hot pink, we assume the tribesman believes, “That jaguar is hot pink,” and that “gavagai” means hot pink. Perhaps future interactions with the tribesman will confirm either of these assumptions. Perhaps they won’t, and we will revise our assumption.

The point is that it is only by ascribing our own beliefs to the tribesman—by developing intersubjectivity for us to live in together—that we’re able to break into his language at all. Without building this intersubjectivity through joint attention, we have no way to mean anything. Without intersubjectivity, there are just forms—sounds, lips moving, pointing—that are indeterminate and meaningless.

What does this mean for LLMs? Once we understand that meaning only comes from assumptions about the beliefs of another, gut reactions we all more or less share, we realize that LLMs don’t have the ability to get the process of meaning started. It’s true that, like all forms of self-supervised machine learning, LLMs make assumptions. They calculate what a statistically likely next word is, given a prompt and their training data, and so assume that is a correct next word. But they don’t make assumptions about the beliefs of others.