A lot of your post talks about an advanced GPT being transformative or scary. I don’t disagree, unless you’re using some technical definition of transformative. I think GPT-3 is already pretty transformative. But AGI goes way beyond that, and that’s what I’m very doubtful is coming in our lifetimes.
It doesn’t care whether it says correct things, only whether it completes its prompts in a realistic way
1) it’s often the case that the models have true models of things they won’t report honestly
2) it seems possible to RLHF models to be more truthful along some metrics and
3) why does this matter?
As for why it matters, I was going off the Future Fund definition of AGI: “For any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less.” Being able to focus on correctness is a requirement of many jobs, and therefore it’s a requirement for AGI under this definition. But there’s no reliable way to make GPT-3 focus on correctness, so GPT-3 isn’t AGI.
Now that I think more about it, I realize that definition of AGI bakes in an assumption of alignment. Under a more common definition, I suppose you could have a program that only cares about giving realistic completions to prompts, and it would still be AGI if it were using human-level (or better) reasoning. So for the rest of this comment, let’s use that more common understanding of AGI (it doesn’t change my timeline).
It can’t choose to spend extra computation on more difficult prompts
I’m not super sure this is true, even as written. I’m pretty sure you can prompt engineer instructGPT so it decides to “think step by step” on harder prompts, while directly outputting the answer on easier ones. But even if this was true, it’s probably fixable with a small amount of finetuning.
If you mean adding “think step-by-step” to the prompt, then this doesn’t fully solve the problem. It still gets just one forward pass per token that it outputs. What if some tokens require more thought than others?
It has no memory outside of its current prompt
This is true, but I’m not sure why being limited to 8000 tokens (or however many for the next generation of LMs) makes it safe? 8000 tokens can be quite a lot in practice. You can certainly get instructGPT to summarize information to pass to itself, for example. I do think there are many tasks that are “inherently” serial and require more than 8000 tokens, but I’m not sure I can make a principled case that any of these are necessary for scary capabilities.
“Getting it to summarize information to pass to itself” is exactly what I mean when I say prompt engineering is brittle and doesn’t address the underlying issues. That’s an ugly hack for a problem that should be solved at the architecture level. For one thing, its not going to be able to recover its complete and correct hidden state from English text.
We know from experience that the correct answers to hard math problems have an elegant simplicity. An approach that feels this clunky will never be the answer to AGI.
It can’t take advantage of external resources (like using a text file to organize its thoughts, or using a calculator for arithmetic)
As written this claim is just false even of instructGPT: https://twitter.com/goodside/status/1581805503897735168 . But even if were certain tools that instructGPT can’t use with only some prompt engineering assistance (and there are many), why are you so confident that this can’t be fixed with a small amount of finetuning on top of this, or by the next generation of models?
It’s interesting to see it calling Python like that. That is pretty cool. But It’s still unimaginably far behind humans. For example, it can’t interact back-and-forth with a tool, e.g. run some code, get an error, check Google about the error, adjust the code. I’m not sure how you would fit such a workflow into the “one pass per output token” paradigm, and even if you could, that would again be a case where you are abusing prompt engineering to paper over an inadequate architecture.
A lot of your post talks about an advanced GPT being transformative or scary. I don’t disagree, unless you’re using some technical definition of transformative. I think GPT-3 is already pretty transformative. But AGI goes way beyond that, and that’s what I’m very doubtful is coming in our lifetimes.
As for why it matters, I was going off the Future Fund definition of AGI: “For any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less.” Being able to focus on correctness is a requirement of many jobs, and therefore it’s a requirement for AGI under this definition. But there’s no reliable way to make GPT-3 focus on correctness, so GPT-3 isn’t AGI.
Now that I think more about it, I realize that definition of AGI bakes in an assumption of alignment. Under a more common definition, I suppose you could have a program that only cares about giving realistic completions to prompts, and it would still be AGI if it were using human-level (or better) reasoning. So for the rest of this comment, let’s use that more common understanding of AGI (it doesn’t change my timeline).
If you mean adding “think step-by-step” to the prompt, then this doesn’t fully solve the problem. It still gets just one forward pass per token that it outputs. What if some tokens require more thought than others?
“Getting it to summarize information to pass to itself” is exactly what I mean when I say prompt engineering is brittle and doesn’t address the underlying issues. That’s an ugly hack for a problem that should be solved at the architecture level. For one thing, its not going to be able to recover its complete and correct hidden state from English text.
We know from experience that the correct answers to hard math problems have an elegant simplicity. An approach that feels this clunky will never be the answer to AGI.
It’s interesting to see it calling Python like that. That is pretty cool. But It’s still unimaginably far behind humans. For example, it can’t interact back-and-forth with a tool, e.g. run some code, get an error, check Google about the error, adjust the code. I’m not sure how you would fit such a workflow into the “one pass per output token” paradigm, and even if you could, that would again be a case where you are abusing prompt engineering to paper over an inadequate architecture.