(Don’t have time for a detailed response; off the top of my head:)
Some people say that we’ve already had the vast majority of the creative insights that are needed for AGI. For example, they argue that GPT-3 can be made into AGI with a little bit of tweaking and scaling.
Rub the stars out of your eyes for a second. GPT-3 is a huge leap forward, but it still has some massive structural deficiencies. From most to least important:
It doesn’t care whether it says correct things, only whether it completes its prompts in a realistic way
It can’t choose to spend extra computation on more difficult prompts
It has no memory outside of its current prompt
It can’t take advantage of external resources (like using a text file to organize its thoughts, or using a calculator for arithmetic)
It can’t think unless it’s processing a prompt
It doesn’t know that it’s a machine learning model
“But these can be solved with a layer of prompt engineering!” Give me a break. That’s obviously a brittle solution that does not address the underlying issues.
I don’t think that GPT-3 can be made into an AGI with “a little bit of tweaking and scaling”. But, I think something that’s a scaled up instructGPT (IE, massive unsupervised pretraining on easy to collect data → small amounts of instruction finetuning and RLHF) could definitely be transformative and scary.
As a meta point, I’m also not sure that GPT-3 lacking a capability is particularly strong evidence that (instruct)GPT-4 won’t be able to do it. For one, GPT-3 is a 2-year old model at this point (even the RLHF-finetuned instructGPT is 9 months old), and SOTA has moved quite a bit further beyond what it was when GPT-3 was made. Chinchilla and PaLM both are significantly better than GPT-3 on most downstream benchmarks and even better than instructGPT on many benchmarks for example. And we don’t have public benchmarks for many of the RLHF’ed models that are deployed/will be deployed in the near future
Responding to each of your points in turn:
This is true of GPT-3 (ie the original davinci) but kind of unimportant—after all, why should caring about saying “correct things” be necessary for transformative impact? It’s also to some extent false of RLHF’ed (or other finetuned) models like instructGPT, where they do care about some reward model that isn’t literally next token prediction on a web corpus. Again, the reward doesn’t perfectly align with saying true things, but 1) it’s often the case that the models have true models of things they won’t report honestly 2) it seems possible to RLHF models to be more truthful along some metrics and 3) why does this matter?
I’m not super sure this is true, even as written. I’m pretty sure you can prompt engineer instructGPT so it decides to “think step by step” on harder prompts, while directly outputting the answer on easier ones. But even if this was true, it’s probably fixable with a small amount of finetuning.
This is true, but I’m not sure why being limited to 8000 tokens (or however many for the next generation of LMs) makes it safe? 8000 tokens can be quite a lot in practice. You can certainly get instructGPT to summarize information to pass to itself, for example. I do think there are many tasks that are “inherently” serial and require more than 8000 tokens, but I’m not sure I can make a principled case that any of these are necessary for scary capabilities.
As written this claim is just false even of instructGPT: https://twitter.com/goodside/status/1581805503897735168 . But even if were certain tools that instructGPT can’t use with only some prompt engineering assistance (and there are many), why are you so confident that this can’t be fixed with a small amount of finetuning on top of this, or by the next generation of models?
Yep, this is fair. The only sense in which computation happens at all in the LM is when a prompt is fed in. But why is this important though? Why is thinking only when prompted a fundamental limitation that prevents these models from being scary?
I think this claim is probably true of instructGPT, but I’m not sure how you’d really elicit this knowledge from a language model (“Sampling can prove the presence of knowledge, but not its absence”). I’m also not sure that it can’t be fixed with better prompt engineering (maybe even just: “you are instructGPT, a large language model serving the OpenAI API”). And even if was true that you couldn’t fix it with prompt engineering, scaffolding, or finetuning, I think you’ll need to say more about why this is necessary for scary capabilities.
Again, I don’t think that GPT is dangerous by itself, and it seems unlikely that GPT+scaffolding will be dangerous either. That being said, you’re making a much stronger claim in the “Current state of the art” section than just “GPT-3 is not dangerous”: that we won’t get AGI in the next 50 years. I do think you can make a principled case that models in the foundational model + finetuning + prompt engineering + scaffolding regime won’t be dangerous (even over the next 50 years), but you need to do more than list a few dubiously correct claims without evidence and then scoffing at prompt engineering.
A lot of your post talks about an advanced GPT being transformative or scary. I don’t disagree, unless you’re using some technical definition of transformative. I think GPT-3 is already pretty transformative. But AGI goes way beyond that, and that’s what I’m very doubtful is coming in our lifetimes.
It doesn’t care whether it says correct things, only whether it completes its prompts in a realistic way
1) it’s often the case that the models have true models of things they won’t report honestly
2) it seems possible to RLHF models to be more truthful along some metrics and
3) why does this matter?
As for why it matters, I was going off the Future Fund definition of AGI: “For any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less.” Being able to focus on correctness is a requirement of many jobs, and therefore it’s a requirement for AGI under this definition. But there’s no reliable way to make GPT-3 focus on correctness, so GPT-3 isn’t AGI.
Now that I think more about it, I realize that definition of AGI bakes in an assumption of alignment. Under a more common definition, I suppose you could have a program that only cares about giving realistic completions to prompts, and it would still be AGI if it were using human-level (or better) reasoning. So for the rest of this comment, let’s use that more common understanding of AGI (it doesn’t change my timeline).
It can’t choose to spend extra computation on more difficult prompts
I’m not super sure this is true, even as written. I’m pretty sure you can prompt engineer instructGPT so it decides to “think step by step” on harder prompts, while directly outputting the answer on easier ones. But even if this was true, it’s probably fixable with a small amount of finetuning.
If you mean adding “think step-by-step” to the prompt, then this doesn’t fully solve the problem. It still gets just one forward pass per token that it outputs. What if some tokens require more thought than others?
It has no memory outside of its current prompt
This is true, but I’m not sure why being limited to 8000 tokens (or however many for the next generation of LMs) makes it safe? 8000 tokens can be quite a lot in practice. You can certainly get instructGPT to summarize information to pass to itself, for example. I do think there are many tasks that are “inherently” serial and require more than 8000 tokens, but I’m not sure I can make a principled case that any of these are necessary for scary capabilities.
“Getting it to summarize information to pass to itself” is exactly what I mean when I say prompt engineering is brittle and doesn’t address the underlying issues. That’s an ugly hack for a problem that should be solved at the architecture level. For one thing, its not going to be able to recover its complete and correct hidden state from English text.
We know from experience that the correct answers to hard math problems have an elegant simplicity. An approach that feels this clunky will never be the answer to AGI.
It can’t take advantage of external resources (like using a text file to organize its thoughts, or using a calculator for arithmetic)
As written this claim is just false even of instructGPT: https://twitter.com/goodside/status/1581805503897735168 . But even if were certain tools that instructGPT can’t use with only some prompt engineering assistance (and there are many), why are you so confident that this can’t be fixed with a small amount of finetuning on top of this, or by the next generation of models?
It’s interesting to see it calling Python like that. That is pretty cool. But It’s still unimaginably far behind humans. For example, it can’t interact back-and-forth with a tool, e.g. run some code, get an error, check Google about the error, adjust the code. I’m not sure how you would fit such a workflow into the “one pass per output token” paradigm, and even if you could, that would again be a case where you are abusing prompt engineering to paper over an inadequate architecture.
(Don’t have time for a detailed response; off the top of my head:)
I don’t think that GPT-3 can be made into an AGI with “a little bit of tweaking and scaling”. But, I think something that’s a scaled up instructGPT (IE, massive unsupervised pretraining on easy to collect data → small amounts of instruction finetuning and RLHF) could definitely be transformative and scary.
As a meta point, I’m also not sure that GPT-3 lacking a capability is particularly strong evidence that (instruct)GPT-4 won’t be able to do it. For one, GPT-3 is a 2-year old model at this point (even the RLHF-finetuned instructGPT is 9 months old), and SOTA has moved quite a bit further beyond what it was when GPT-3 was made. Chinchilla and PaLM both are significantly better than GPT-3 on most downstream benchmarks and even better than instructGPT on many benchmarks for example. And we don’t have public benchmarks for many of the RLHF’ed models that are deployed/will be deployed in the near future
Responding to each of your points in turn:
This is true of
GPT-3
(ie the originaldavinci
) but kind of unimportant—after all, why should caring about saying “correct things” be necessary for transformative impact? It’s also to some extent false of RLHF’ed (or other finetuned) models likeinstructGPT
, where they do care about some reward model that isn’t literally next token prediction on a web corpus. Again, the reward doesn’t perfectly align with saying true things, but 1) it’s often the case that the models have true models of things they won’t report honestly 2) it seems possible to RLHF models to be more truthful along some metrics and 3) why does this matter?I’m not super sure this is true, even as written. I’m pretty sure you can prompt engineer instructGPT so it decides to “think step by step” on harder prompts, while directly outputting the answer on easier ones. But even if this was true, it’s probably fixable with a small amount of finetuning.
This is true, but I’m not sure why being limited to 8000 tokens (or however many for the next generation of LMs) makes it safe? 8000 tokens can be quite a lot in practice. You can certainly get
instructGPT
to summarize information to pass to itself, for example. I do think there are many tasks that are “inherently” serial and require more than 8000 tokens, but I’m not sure I can make a principled case that any of these are necessary for scary capabilities.As written this claim is just false even of
instructGPT
: https://twitter.com/goodside/status/1581805503897735168 . But even if were certain tools thatinstructGPT
can’t use with only some prompt engineering assistance (and there are many), why are you so confident that this can’t be fixed with a small amount of finetuning on top of this, or by the next generation of models?Yep, this is fair. The only sense in which computation happens at all in the LM is when a prompt is fed in. But why is this important though? Why is thinking only when prompted a fundamental limitation that prevents these models from being scary?
I think this claim is probably true of
instructGPT
, but I’m not sure how you’d really elicit this knowledge from a language model (“Sampling can prove the presence of knowledge, but not its absence”). I’m also not sure that it can’t be fixed with better prompt engineering (maybe even just: “you areinstructGPT
, a large language model serving the OpenAI API”). And even if was true that you couldn’t fix it with prompt engineering, scaffolding, or finetuning, I think you’ll need to say more about why this is necessary for scary capabilities.Again, I don’t think that GPT is dangerous by itself, and it seems unlikely that GPT+scaffolding will be dangerous either. That being said, you’re making a much stronger claim in the “Current state of the art” section than just “GPT-3 is not dangerous”: that we won’t get AGI in the next 50 years. I do think you can make a principled case that models in the foundational model + finetuning + prompt engineering + scaffolding regime won’t be dangerous (even over the next 50 years), but you need to do more than list a few dubiously correct claims without evidence and then scoffing at prompt engineering.
A lot of your post talks about an advanced GPT being transformative or scary. I don’t disagree, unless you’re using some technical definition of transformative. I think GPT-3 is already pretty transformative. But AGI goes way beyond that, and that’s what I’m very doubtful is coming in our lifetimes.
As for why it matters, I was going off the Future Fund definition of AGI: “For any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less.” Being able to focus on correctness is a requirement of many jobs, and therefore it’s a requirement for AGI under this definition. But there’s no reliable way to make GPT-3 focus on correctness, so GPT-3 isn’t AGI.
Now that I think more about it, I realize that definition of AGI bakes in an assumption of alignment. Under a more common definition, I suppose you could have a program that only cares about giving realistic completions to prompts, and it would still be AGI if it were using human-level (or better) reasoning. So for the rest of this comment, let’s use that more common understanding of AGI (it doesn’t change my timeline).
If you mean adding “think step-by-step” to the prompt, then this doesn’t fully solve the problem. It still gets just one forward pass per token that it outputs. What if some tokens require more thought than others?
“Getting it to summarize information to pass to itself” is exactly what I mean when I say prompt engineering is brittle and doesn’t address the underlying issues. That’s an ugly hack for a problem that should be solved at the architecture level. For one thing, its not going to be able to recover its complete and correct hidden state from English text.
We know from experience that the correct answers to hard math problems have an elegant simplicity. An approach that feels this clunky will never be the answer to AGI.
It’s interesting to see it calling Python like that. That is pretty cool. But It’s still unimaginably far behind humans. For example, it can’t interact back-and-forth with a tool, e.g. run some code, get an error, check Google about the error, adjust the code. I’m not sure how you would fit such a workflow into the “one pass per output token” paradigm, and even if you could, that would again be a case where you are abusing prompt engineering to paper over an inadequate architecture.