The Cave Allegory Revisited: Understanding GPT’s Worldview
A short post describing a metaphor I find useful, in particular for explaining some intuitions about systems like GPT to people who don’t have deeper technical knowledge about large generative models.
Plato’s allegory of the cave has been a staple of philosophical discourse for millenia, providing a metaphor for understanding the limits of human perception. In the classical allegory, we are prisoners shackled to a wall of a cave, unable to experience reality directly but only able to infer it based on watching shadows cast on the wall.[1]
GPT can be thought of as a blind oracle residing in a deeper cave, where it does not even see the shadows but only hears our conversations in the first cave, always trying to predict the next syllable.
It is remarkable that it still learns a lot about the world outside of the cave. Why does it learn this? Because, a model of reality outside of the cave and a decent amount of abstraction are useful for predicting the conversations in the first cave!
Moreover, GPT also learns about the speakers in the first cave, as understanding their styles and patterns of speech is crucial for its prediction task. As the speakers are closer to GPT, understanding their styles is in some sense easier and more natural than guessing what’s outside of the cave.
What does the second cave allegory illustrate?
The first insight from the allegory is: if you are in GPT’s place, part of the difficulty in figuring out what’s going on outside the cave, is that people in the first cave talk a lot about other things apart from the shadows of the real world. Sometimes, they talk about happenings in Middle Earth. Or about how the shadows would look in some counterfactual world.
As humans, we are blessed with the luxury of being able to compare such statements to the shadows and determine their veracity. The difference between conversations about fantasy and the shadows of the real world is usually extremely obvious to humans: we never see dragon shadows. In contrast, dragons do show up a lot in the conversations in the first cave; GPT doesn’t get to see the shadows, so it often needs to stay deeply uncertain about whether the speaker is describing the actual shadows or something else to be good at predicting the conversation.
The second insight is that one of the biggest challenges for GPT in figuring out the conversation is localizing it, determining who is speaking and what the context is, just from the words. Is it a child regaling another child with a fairy-tale, or a CEO delivering a corporate address? As humans we do not face this conundrum often,because we can see the context in which the conversation is taking place. In fact, we would be worse than GPT at the task it has to deal with.
At first, interacting with this type of blind oracle in the second cave was disorienting for humans. Talking to GPT used to be a bit like shouting something through a narrow tunnel into the second cave …and instead of an echo, getting back what the blind oracle hallucinates is the most likely thing that you or someone else would say next. Often people were confused by this. They shouted instructions and expected an answer, but the oracle doesn’t listen to instructions or produce answers directly—it just hallucinates what someone might say next. Because on average in the conversations in the first cave questions are followed by answers, and requests by fulfilment, this sort of works.
One innovation of ChatGPT, which made it popular with people, was localising the conversation by default: when you are talking with ChatGPT now, it knows that what follows is a conversation between a human—you—and a “helpful AI assistant”. There is a subtle point to understand: this does not make ChatGPT the helpful assistant it is talking about. Deep down, it is still the oracle one cave deeper, but now hallucinating what a “helpful AI assistant” would say, if living in the first cave. Stretching the metaphor a bit, it’s as though the entrance to the tunnel to the second cave has been fitted with a friendly, smiling, mechanical doll.
The third, and possibly most important, insight is that the fact that the GPT oracle resides in the second cave now, is not a given fact of nature. In the not too distant future, it seems easy to imagine oracles which would not only be able to predict words, but also would be able to see the shadows directly, or even act in the world. It is easy to see that such systems would have a clearer incentive to understand what’s real, and would get better at it.
Rose Hadshar and other members of ACS research group helped with writing this & comments and discussion on the draft. Thanks!
- ^
In a take of the allegory inspired by contemporary cognitive science, perhaps the more surprising fact to note is not “we do not have direct access to reality”, but “even if we are just watching the shadows, we learn a lot about reality and a decent amount of abstraction”. According to the theory of predictive processing, actually “predicting the shadows” is a large part of what our minds do—and they build complex generative models of the world based on this task. Having an implicit world model—that is, a model of the reality outside of the cave—is ultimately useful to take actions, make decisions, and prosper as evolved animals.
- Why Simulator AIs want to be Active Inference AIs by 10 Apr 2023 18:23 UTC; 91 points) (
- No, really, it predicts next tokens. by 18 Apr 2023 3:47 UTC; 58 points) (
- Direction of Fit by 2 Oct 2023 12:34 UTC; 34 points) (
- Why Simulator AIs want to be Active Inference AIs by 11 Apr 2023 9:06 UTC; 22 points) (EA Forum;
One thing I’ve always found inaccurate/confusing about the cave analogy is that humans interact with what they perceive (whether its reality or shadows) as opposed to the slave that can only passively observe the projections on the wall. Even if we can’t perceive reality directly, (whatever this means), by interacting with it we can poke at it to gain a much better understanding of things (eg. experiments). This extends to things we can’t directly see or measure.
ChatGPT is similar to the slave in the cave itself, having the ability to observe the real world, or some representation of it, but not (yet) interacting with it to learn (if we only consider it to learn at training time).
But aren’t there very occasionally real, physical, shadows shaped like a stereotypical image of a dragon?
A child might even be fooled by it after seeing images of dragons in a picture book. It’s only when we grow up do we realize that the shadow itself might not reflect a single discrete object.
Mostly unrelated—I’m curious about the page you linked to https://acsresearch.org/
As far as I see this is a fun site with a network simulation without any explanation. I’d have liked to see an about page with the stated goals of acs (or simply a link to your introductory post) so I can point to that site when talking about you.
Great post! However, I did notice that you discussed the same topic as a paper my colleague and I recently posted online. While it’s always good to share new ideas, it’s important to also give credit where it’s due. It would have been much appreciated if you had cited our work and provided a link for reference. Our paper can be found here.
Thank you for considering this in the future.
Thanks for the comment. I haven’t noticed your preprint before your comment, but it’s probably worth noting I’ve described the point of this post in a facebook post on 8th Dec 2022; this LW/AF post is just a bit more polished and referenceable. As your paper had zero influence on writing this, and the content predates your paper by a month, I don’t see a clear case for citing your work.