Yeah, that’s a good question. It’s similar to training image classifiers on human-labelled data – they can become cheaper than humans and they can become more consistent than humans (ie., since humans make uncorrelated errors, the answer that the most humans would pick can be systematically better than the answer that a random human would pick), but they can’t gain vastly superhuman classification abilities.
In this case, one plausible route to outperforming humans would be to start out with a GPT-like model, and then finetune it on some downstream task in an RL-like fashion (see e.g. this). I don’t see any reason why modelling the internet couldn’t lead to latent superhuman ability, and finetuning could then be used to teach the model to use its capabilities in ways that humans wouldn’t. Indeed, there’s certainly no single human who could optimally predict every next word of internet-text, so optimal performance on the training task would require the model to become superhuman on at least that task.
Or if we’re unlucky, sufficiently large models trained for sufficiently long could lead to something like a misaligned mesa optimizer, which would already “want” to use its capabilities in ways that humans wouldn’t.
Interesting, thanks for the reply. I agree that it could develop superhuman ability in some domains, even if that ability doesn’t manifest in the model’s output, so that seems promising (although not very scaleable). I haven’t read on mesa optimizers yet.
I have very little knowledge of AI or the mechanics behind GPT, so this is more of a question than criticism:
If a scaled up GPT-N is trained on human-generated data, how would it ever become more intelligent than the people whose data it is trained on?
Yeah, that’s a good question. It’s similar to training image classifiers on human-labelled data – they can become cheaper than humans and they can become more consistent than humans (ie., since humans make uncorrelated errors, the answer that the most humans would pick can be systematically better than the answer that a random human would pick), but they can’t gain vastly superhuman classification abilities.
In this case, one plausible route to outperforming humans would be to start out with a GPT-like model, and then finetune it on some downstream task in an RL-like fashion (see e.g. this). I don’t see any reason why modelling the internet couldn’t lead to latent superhuman ability, and finetuning could then be used to teach the model to use its capabilities in ways that humans wouldn’t. Indeed, there’s certainly no single human who could optimally predict every next word of internet-text, so optimal performance on the training task would require the model to become superhuman on at least that task.
Or if we’re unlucky, sufficiently large models trained for sufficiently long could lead to something like a misaligned mesa optimizer, which would already “want” to use its capabilities in ways that humans wouldn’t.
Interesting, thanks for the reply. I agree that it could develop superhuman ability in some domains, even if that ability doesn’t manifest in the model’s output, so that seems promising (although not very scaleable). I haven’t read on mesa optimizers yet.