The post showcases the inability of the aggregate LW community to recognize locally invalid reasoning: while the post reaches a correct conclusion, the argument leading to it is locally invalid, as explained in comments. High karma and high alignment forum karma shows a combination of famous author and correct conclusion wins over the argument being correct.
The OP argument boils down to: the text prediction objective doesn’t stop incentivizing higher capabilities once you get to human level capabilities. This is a valid counter-argument to: GPTs will cap out at human capabilities because humans generated the training data.
Your central point is:
Where GPT and humans differ is not some general mathematical fact about the task, but differences in what sensory data is a human and GPT trying to predict, and differences in cognitive architecture and ways how the systems are bounded.
You are misinterpreting the OP by thinking it’s about comparing the mathematical properties of two tasks, when it was just pointing at the loss gradient of the text prediction task (at the location of a ~human capability profile). The OP works through text prediction sub-tasks where it’s obvious that the gradient points toward higher-than-human inference capabilities.
You seem to focus too hard on the minima of the loss function:
notice that “what would the loss function like the system to do” in principle tells you very little about what the system will do
You’re correct to point out that the minima of a loss function doesn’t tell you much about the actual loss that could be achieved by a particular system. Like you say, the particular boundedness and cognitive architecture are more relevant to this question. But this is irrelevant to the argument being made, which is about whether the text prediction objective stops incentivising improvements above human capability.
The post showcases the inability of the aggregate LW community to recognize locally invalid reasoning
I think a better lesson to learn is that communication is hard, and therefore we should try not to be too salty toward each other.
The question is not about the very general claim, or general argument, but about this specific reasoning step
GPT-4 is still not as smart as a human in many ways, but it’s naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.
And since the task that GPTs are being trained on is different from and harder than the task of being a human, ….
I do claim this is not locally valid, that’s all (and recommend reading the linked essay). I do not claim the broad argument that text prediction objective doesn’t stop incentivizing higher capabilities once you get to human level capabilities is wrong.
I do agree communication can be hard, and maybe I misunderstand the quoted two sentences, but it seems very natural to read them as making a comparison between tasks at the level of math.
I don’t understand the problem with this sentence. Yes, the task is harder than the task of being a human (as good as a human is at that task). Many objectives that humans optimize for are also not optimized to 100%, and as such, humans also face many tasks that they would like to get better at, and so are harder than the task of simply being a human. Indeed, if you optimized an AI system on those, you would also get no guarantee that the system would end up only as competent as a human.
This is a fact about practically all tasks (including things like calculating the nth-digit of pi, or playing chess), but it is indeed a fact that lots of people get wrong.
There are multiple ways to interpret “being an actual human”. I interpret it as pointing at an ability level.
“the task GPTs are being trained on is harder” ⇒ the prediction objective doesn’t top out at (i.e. the task has more difficulty in it than).
“than being an actual human” ⇒ the ability level of a human (i.e. the task of matching the human ability level at the relevant set of tasks).
Or as Eliezer said:
I said that GPT’s task is harder than being an actual human; in other words, being an actual human is not enough to solve GPT’s task.
In different words again: the tasks GPTs are being incentivised to solve aren’t all solvable at a human level of capability.
You almost had it when you said:
- Maybe you mean something like task + performance threshold. Here ‘predict the activation of photoreceptors in human retina well enough to be able to function as a typical human’ is clearly less difficult than task + performance threshold ‘predict next word on the internet, almost perfectly’. But this comparison does not seem to be particularly informative.
It’s more accurate if I edit it to:
- Maybe you mean something like task + performance threshold. Here ‘predict the activation of photoreceptors in human retina [text] well enough to be able to function as a typical human’ is clearly less difficult than task + performance threshold ‘predict next word on the internet, almost perfectly’.
You say it’s not particularly informative. Eliezer responds by explaining the argument it responds to, which provides the context in which this is an informative statement about the training incentives of a GPT.
The post showcases the inability of the aggregate LW community to recognize locally invalid reasoning: while the post reaches a correct conclusion, the argument leading to it is locally invalid, as explained in comments. High karma and high alignment forum karma shows a combination of famous author and correct conclusion wins over the argument being correct.
The OP argument boils down to: the text prediction objective doesn’t stop incentivizing higher capabilities once you get to human level capabilities. This is a valid counter-argument to: GPTs will cap out at human capabilities because humans generated the training data.
Your central point is:
You are misinterpreting the OP by thinking it’s about comparing the mathematical properties of two tasks, when it was just pointing at the loss gradient of the text prediction task (at the location of a ~human capability profile). The OP works through text prediction sub-tasks where it’s obvious that the gradient points toward higher-than-human inference capabilities.
You seem to focus too hard on the minima of the loss function:
You’re correct to point out that the minima of a loss function doesn’t tell you much about the actual loss that could be achieved by a particular system. Like you say, the particular boundedness and cognitive architecture are more relevant to this question. But this is irrelevant to the argument being made, which is about whether the text prediction objective stops incentivising improvements above human capability.
I think a better lesson to learn is that communication is hard, and therefore we should try not to be too salty toward each other.
The question is not about the very general claim, or general argument, but about this specific reasoning step
I do claim this is not locally valid, that’s all (and recommend reading the linked essay). I do not claim the broad argument that text prediction objective doesn’t stop incentivizing higher capabilities once you get to human level capabilities is wrong.
I do agree communication can be hard, and maybe I misunderstand the quoted two sentences, but it seems very natural to read them as making a comparison between tasks at the level of math.
I don’t understand the problem with this sentence. Yes, the task is harder than the task of being a human (as good as a human is at that task). Many objectives that humans optimize for are also not optimized to 100%, and as such, humans also face many tasks that they would like to get better at, and so are harder than the task of simply being a human. Indeed, if you optimized an AI system on those, you would also get no guarantee that the system would end up only as competent as a human.
This is a fact about practically all tasks (including things like calculating the nth-digit of pi, or playing chess), but it is indeed a fact that lots of people get wrong.
(I affirm this as my intended reading.)
There are multiple ways to interpret “being an actual human”. I interpret it as pointing at an ability level.
“the task GPTs are being trained on is harder” ⇒ the prediction objective doesn’t top out at (i.e. the task has more difficulty in it than).
“than being an actual human” ⇒ the ability level of a human (i.e. the task of matching the human ability level at the relevant set of tasks).
Or as Eliezer said:
In different words again: the tasks GPTs are being incentivised to solve aren’t all solvable at a human level of capability.
You almost had it when you said:
It’s more accurate if I edit it to:
- Maybe you mean something like task + performance threshold. Here ‘predict the
activation of photoreceptors in human retina[text] well enough to be able to function as a typical human’ is clearly less difficult than task + performance threshold ‘predict next word on the internet,almostperfectly’.You say it’s not particularly informative. Eliezer responds by explaining the argument it responds to, which provides the context in which this is an informative statement about the training incentives of a GPT.