GPTs are Predictors, not Imitators
(Related text posted to Twitter; this version is edited and has a more advanced final section.)
Imagine yourself in a box, trying to predict the next word—assign as much probability mass to the next token as possible—for all the text on the Internet.
Koan: Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text? What factors make that task easier, or harder? (If you don’t have an answer, maybe take a minute to generate one, or alternatively, try to predict what I’ll say next; if you do have an answer, take a moment to review it inside your mind, or maybe say the words out loud.)
Consider that somewhere on the internet is probably a list of thruples: <product of 2 prime numbers, first prime, second prime>.
GPT obviously isn’t going to predict that successfully for significantly-sized primes, but it illustrates the basic point:
There is no law saying that a predictor only needs to be as intelligent as the generator, in order to predict the generator’s next token.
Indeed, in general, you’ve got to be more intelligent to predict particular X, than to generate realistic X. GPTs are being trained to a much harder task than GANs.
Same spirit: <Hash, plaintext> pairs, which you can’t predict without cracking the hash algorithm, but which you could far more easily generate typical instances of if you were trying to pass a GAN’s discriminator about it (assuming a discriminator that had learned to compute hash functions).
Consider that some of the text on the Internet isn’t humans casually chatting. It’s the results section of a science paper. It’s news stories that say what happened on a particular day, where maybe no human would be smart enough to predict the next thing that happened in the news story in advance of it happening.
As Ilya Sutskever compactly put it, to learn to predict text, is to learn to predict the causal processes of which the text is a shadow.
Lots of what’s shadowed on the Internet has a *complicated* causal process generating it.
Consider that sometimes human beings, in the course of talking, make errors.
GPTs are not being trained to imitate human error. They’re being trained to *predict* human error.
Consider the asymmetry between you, who makes an error, and an outside mind that knows you well enough and in enough detail to predict *which* errors you’ll make.
If you then ask that predictor to become an actress and play the character of you, the actress will guess which errors you’ll make, and play those errors. If the actress guesses correctly, it doesn’t mean the actress is just as error-prone as you.
Consider that a lot of the text on the Internet isn’t extemporaneous speech. It’s text that people crafted over hours or days.
GPT-4 is being asked to predict it in 200 serial steps or however many layers it’s got, just like if a human was extemporizing their immediate thoughts.
A human can write a rap battle in an hour. A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.
Or maybe simplest:
Imagine somebody telling you to make up random words, and you say, “Morvelkainen bloombla ringa mongo.”
Imagine a mind of a level—where, to be clear, I’m not saying GPTs are at this level yet -
Imagine a Mind of a level where it can hear you say ‘morvelkainen blaambla ringa’, and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is ‘mongo’.
The fact that this Mind could double as a really good actor playing your character, does not mean They are only exactly as smart as you.
When you’re trying to be human-equivalent at writing text, you can just make up whatever output, and it’s now a human output because you’re human and you chose to output that.
GPT-4 is being asked to predict all that stuff you’re making up. It doesn’t get to make up whatever. It is being asked to model what you were thinking—the thoughts in your mind whose shadow is your text output—so as to assign as much probability as possible to your true next word.
Figuring out that your next utterance is ‘mongo’ is not mostly a question, I’d guess, of that mighty Mind being hammered into the shape of a thing that can simulate arbitrary humans, and then some less intelligent subprocess being responsible for adapting the shape of that Mind to be you exactly, after which it simulates you saying ‘mongo’. Figuring out exactly who’s talking, to that degree, is a hard inference problem which seems like noticeably harder mental work than the part where you just say ‘mongo’.
When you predict how to chip a flint handaxe, you are not mostly a causal process that behaves like a flint handaxe, plus some computationally weaker thing that figures out which flint handaxe to be. It’s not a problem that is best solved by “have the difficult ability to be like any particular flint handaxe, and then easily figure out which flint handaxe to be”.
GPT-4 is still not as smart as a human in many ways, but it’s naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.
And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising—even leaving aside all the ways that gradient descent differs from natural selection—if GPTs ended up thinking the way humans do, in order to solve that problem.
GPTs are not Imitators, nor Simulators, but Predictors.
- Alignment Implications of LLM Successes: a Debate in One Act by 21 Oct 2023 15:22 UTC; 247 points) (
- On the future of language models by 20 Dec 2023 16:58 UTC; 125 points) (EA Forum;
- On the future of language models by 20 Dec 2023 16:58 UTC; 105 points) (
- The Compleat Cybornaut by 19 May 2023 8:44 UTC; 65 points) (
- 1 Sep 2023 19:55 UTC; 59 points) 's comment on Introducing the Center for AI Policy (& we’re hiring!) by (
- Trying to deconfuse some core AI x-risk problems by 17 Oct 2023 18:36 UTC; 34 points) (
- AI #7: Free Agency by 13 Apr 2023 16:20 UTC; 33 points) (
- Summaries of top forum posts (27th March to 16th April) by 17 Apr 2023 0:28 UTC; 31 points) (EA Forum;
- Language Models are a Potentially Safe Path to Human-Level AGI by 20 Apr 2023 0:40 UTC; 28 points) (
- LLM cognition is probably not human-like by 8 May 2023 1:22 UTC; 26 points) (
- [Linkpost] Transcendence: Generative Models Can Outperform The Experts That Train Them by 18 Jun 2024 11:00 UTC; 19 points) (
- Summaries of top forum posts (27th March to 16th April) by 17 Apr 2023 0:28 UTC; 14 points) (
- Lifelogging for Alignment & Immortality by 17 Aug 2024 23:42 UTC; 13 points) (
- 22 Aug 2023 15:26 UTC; 12 points) 's comment on ChatGPT challenges the case for human irrationality by (
- Quick Thoughts on Language Models by 19 Jul 2023 16:51 UTC; 10 points) (EA Forum;
- 20 Apr 2023 1:41 UTC; 7 points) 's comment on The basic reasons I expect AGI ruin by (
- Quick Thoughts on Language Models by 18 Jul 2023 20:38 UTC; 6 points) (
- 17 Apr 2023 8:17 UTC; 4 points) 's comment on All AGI Safety questions welcome (especially basic ones) [April 2023] by (EA Forum;
- ACI#9: What is Intelligence by 9 Dec 2024 21:54 UTC; 3 points) (
- 27 Apr 2023 5:51 UTC; 3 points) 's comment on I was Wrong, Simulator Theory is Real by (
- 18 May 2023 18:47 UTC; 2 points) 's comment on We Shouldn’t Expect AI to Ever be Fully Rational by (
- Is there a fundamental distinction between simulating a mind and simulating *being* a mind? Is this a useful and important distinction? by 8 Apr 2023 23:44 UTC; -17 points) (
The post showcases the inability of the aggregate LW community to recognize locally invalid reasoning: while the post reaches a correct conclusion, the argument leading to it is locally invalid, as explained in comments. High karma and high alignment forum karma shows a combination of famous author and correct conclusion wins over the argument being correct.
The OP argument boils down to: the text prediction objective doesn’t stop incentivizing higher capabilities once you get to human level capabilities. This is a valid counter-argument to: GPTs will cap out at human capabilities because humans generated the training data.
Your central point is:
You are misinterpreting the OP by thinking it’s about comparing the mathematical properties of two tasks, when it was just pointing at the loss gradient of the text prediction task (at the location of a ~human capability profile). The OP works through text prediction sub-tasks where it’s obvious that the gradient points toward higher-than-human inference capabilities.
You seem to focus too hard on the minima of the loss function:
You’re correct to point out that the minima of a loss function doesn’t tell you much about the actual loss that could be achieved by a particular system. Like you say, the particular boundedness and cognitive architecture are more relevant to this question. But this is irrelevant to the argument being made, which is about whether the text prediction objective stops incentivising improvements above human capability.
I think a better lesson to learn is that communication is hard, and therefore we should try not to be too salty toward each other.
The question is not about the very general claim, or general argument, but about this specific reasoning step
I do claim this is not locally valid, that’s all (and recommend reading the linked essay). I do not claim the broad argument that text prediction objective doesn’t stop incentivizing higher capabilities once you get to human level capabilities is wrong.
I do agree communication can be hard, and maybe I misunderstand the quoted two sentences, but it seems very natural to read them as making a comparison between tasks at the level of math.
I don’t understand the problem with this sentence. Yes, the task is harder than the task of being a human (as good as a human is at that task). Many objectives that humans optimize for are also not optimized to 100%, and as such, humans also face many tasks that they would like to get better at, and so are harder than the task of simply being a human. Indeed, if you optimized an AI system on those, you would also get no guarantee that the system would end up only as competent as a human.
This is a fact about practically all tasks (including things like calculating the nth-digit of pi, or playing chess), but it is indeed a fact that lots of people get wrong.
(I affirm this as my intended reading.)
There are multiple ways to interpret “being an actual human”. I interpret it as pointing at an ability level.
“the task GPTs are being trained on is harder” ⇒ the prediction objective doesn’t top out at (i.e. the task has more difficulty in it than).
“than being an actual human” ⇒ the ability level of a human (i.e. the task of matching the human ability level at the relevant set of tasks).
Or as Eliezer said:
In different words again: the tasks GPTs are being incentivised to solve aren’t all solvable at a human level of capability.
You almost had it when you said:
It’s more accurate if I edit it to:
- Maybe you mean something like task + performance threshold. Here ‘predict the
activation of photoreceptors in human retina[text] well enough to be able to function as a typical human’ is clearly less difficult than task + performance threshold ‘predict next word on the internet,almostperfectly’.You say it’s not particularly informative. Eliezer responds by explaining the argument it responds to, which provides the context in which this is an informative statement about the training incentives of a GPT.
Does this look like a motte-and-bailey to you?
Bailey: GPTs are Predictors, not Imitators (nor Simulators).
Motte: The training task for GPTs is a prediction task.
The title and the concluding sentence both plainly advocate for (1), but it’s not really touched by the overall post, and I think it’s up for debate (related: reward is not the optimization target). Instead there is an argument for (2). Perhaps the intention of the final sentence was to oppose Simulators? If that’s the case, cite it, be explicit. This could be a really easy thing for an editor to fix.
Does this look like a motte-and-bailey to you?
Bailey: The task that GPTs are being trained on is … harder than the task of being a human.
Motte: Being an actual human is not enough to solve GPT’s task.
As I read it, (1) is false, the task of being a human doesn’t cap out at human intelligence. More intelligent humans are better at minimizing prediction error, achieving goals, inclusive genetic fitness, whatever you might think defines “the task of being a human”. In the comments, Yudkowsky retreats to (2), which is true. But then how should I understand this whole paragraph from the post?
If we’re talking about how natural selection trained my genome, why are we talking about how well humans perform the human task? Evolution is optimizing over generations. My human task is optimizing over my lifetime. Also, if we’re just arguing for different thinking, surely it mostly matters whether the training task is different, not whether it is harder?
Overall I think “Is GPT-N bounded by human capabilities? No.” is a better post on the mottes and avoids staking out unsupported baileys. This entire topic is becoming less relevant because AIs are getting all sorts of synthetic data and RLHF and other training techniques thrown at them. The 2022 question of the capabilities of a hypothetical GPT-N that was only trained on the task of predicting human text is academic in 2024. On the other hand, it’s valuable for people to practice on this simpler question before moving on to harder ones.
As someone who expects LLMs to be a dead end, I nonetheless think this post makes a valid point and does so using reasonable and easy to understand arguments. I voted +1.