I feel like you’ve significantly misrepresented the people who think AGI is 10-20 years away.
Two things you mention:
Notice that this a math problem, not an engineering problem...They’re sweeping all of the math work—all of the necessary algorithmic innovations—under the rug. As if that stuff will just fall into our lap, ready to copy into PyTorch.
But creative insights do not come on command. It’s not unheard of that a math problem remains open for 1000 years.
And with respect to scale maximialism, you write:
Some people say that we’ve already had the vast majority of the creative insights that are needed for AGI. For example, they argue that GPT-3 can be made into AGI with a little bit of tweaking and scaling...”But these can be solved with a layer of prompt engineering!” Give me a break. That’s obviously a brittle solution that does not address the underlying issues.
So—AGI is not (imo) a pure engineering problem as you define it, in a sense of “We have all the pieces, and just need to put them together”. Some people have suggested this, but a lot of sub-20-year timelines people don’t believe this. And I haven’t heard of anyone saying GPT-3 can be made into AGI with a bit of tweaking, scaling, and prompt engineering.
But I wouldn’t call it a math problem as you define it either, where we have no idea how to make progress and the problem is completely unsolvable until suddenly it isn’t. We have clearly made steady progress on deep learning, year after year, for at least the last decade. These include loads of algorithmic innovations which people went out and found, the same required innovations you claim we’re “sweeping under the rug”. We’re not sweeping them under the rug, we’re looking at the last ten years of progress and extrapolating it forward! We have solved problems that were thought impossible or highly intractable, like Go. We don’t know exactly how long the path is, but we can definitely look back and think there is a pretty solid probability we’re closer than we were last year. Maybe we need a paradigm shift to get AGI, and our current efforts will be dead ends. On the other hand, deep learning and transformers have both been paradigm shifts and they’ve happened in the last couple of decades—transformers are only a few years old. We could need two more paradigm shifts and still get them in the next 20 years.
The general argument I would make for <50 year timelines is this:
Over the last decade, we have been making incredibly fast progress, both in algorithms and in scaling, in deep learning.
Deep learning and transformers are both recent paradigm shifts that have led to huge capability increases in AI. We see no signs of this slowing down.
We have solved, or made massive progress on, multiple problems that people have previously predicted were highly intractable, despite not fundamentally understanding intelligence. (Go, natural language processing)
Given this, we can see that we’ve made fast progress, our current paradigms are scaling strongly, and we seem to have the ability to create paradigm shifts when needed. While this far from guarantees AI by 2040, or even 2070, it seems like there is a very plausible path to have AGI in this timeframe that I’d assign much more than 10% probability mass of.
Also, for what it’s worth—I did your thought experiment. Option 1 feels me with a deep sense of relief, and Option 2 fills me with dread. I don’t want AGI, and if I was convinced you were correct about <10% AGI by 2070, I would seriously consider working on something else. (Direct work in software engineering, earning to give for global poverty, or going back to school and working to prevent aging if I found myself reluctant to give up the highly ambitious nature of the AI alignment problem)
And I haven’t heard of anyone saying GPT-3 can be made into AGI with a bit of tweaking, scaling, and prompt engineering.
I am one who says that (not certain, but high probability), so i thought I will chime in.
The main ideas of my belief is that
Kaplan paper/chinchilla paper shows the function between resources and cross entropy loss. With high probability I believe that this scaling won’t break down significantly, ie. We can get ever closer to the theoretical irreducible entropy with transformer architectures.
Cross entropy loss measures the distance between two probability distributions, in this case the distribution of human generated text (encoded with tokens) and the empirical distribution generated by the model. I believe with high probability that this measure is relevant, ie we can only get to a low enough cross entropy loss when the model is capable of doing human comparable intellectual work (irrespective of it actually doing it).
After the model achieves the necessary cross entropy loss and consequently becomes capable somewhere in it to produce agi level work (as per 2.), we can get the model to output that level of work with minor tweaks (I don’t have specifics, but think on the level of letting the model to recusrively call itself on some generated text with a special output command or some such)
I don’t think prompt engineering is relevant to agi.
I would be glad for any information that can help me update.
I feel like you’ve significantly misrepresented the people who think AGI is 10-20 years away.
Two things you mention:
And with respect to scale maximialism, you write:
So—AGI is not (imo) a pure engineering problem as you define it, in a sense of “We have all the pieces, and just need to put them together”. Some people have suggested this, but a lot of sub-20-year timelines people don’t believe this. And I haven’t heard of anyone saying GPT-3 can be made into AGI with a bit of tweaking, scaling, and prompt engineering.
But I wouldn’t call it a math problem as you define it either, where we have no idea how to make progress and the problem is completely unsolvable until suddenly it isn’t. We have clearly made steady progress on deep learning, year after year, for at least the last decade. These include loads of algorithmic innovations which people went out and found, the same required innovations you claim we’re “sweeping under the rug”. We’re not sweeping them under the rug, we’re looking at the last ten years of progress and extrapolating it forward! We have solved problems that were thought impossible or highly intractable, like Go. We don’t know exactly how long the path is, but we can definitely look back and think there is a pretty solid probability we’re closer than we were last year. Maybe we need a paradigm shift to get AGI, and our current efforts will be dead ends. On the other hand, deep learning and transformers have both been paradigm shifts and they’ve happened in the last couple of decades—transformers are only a few years old. We could need two more paradigm shifts and still get them in the next 20 years.
The general argument I would make for <50 year timelines is this:
Over the last decade, we have been making incredibly fast progress, both in algorithms and in scaling, in deep learning.
Deep learning and transformers are both recent paradigm shifts that have led to huge capability increases in AI. We see no signs of this slowing down.
We have solved, or made massive progress on, multiple problems that people have previously predicted were highly intractable, despite not fundamentally understanding intelligence. (Go, natural language processing)
Given this, we can see that we’ve made fast progress, our current paradigms are scaling strongly, and we seem to have the ability to create paradigm shifts when needed. While this far from guarantees AI by 2040, or even 2070, it seems like there is a very plausible path to have AGI in this timeframe that I’d assign much more than 10% probability mass of.
Also, for what it’s worth—I did your thought experiment. Option 1 feels me with a deep sense of relief, and Option 2 fills me with dread. I don’t want AGI, and if I was convinced you were correct about <10% AGI by 2070, I would seriously consider working on something else. (Direct work in software engineering, earning to give for global poverty, or going back to school and working to prevent aging if I found myself reluctant to give up the highly ambitious nature of the AI alignment problem)
I am one who says that (not certain, but high probability), so i thought I will chime in. The main ideas of my belief is that
Kaplan paper/chinchilla paper shows the function between resources and cross entropy loss. With high probability I believe that this scaling won’t break down significantly, ie. We can get ever closer to the theoretical irreducible entropy with transformer architectures.
Cross entropy loss measures the distance between two probability distributions, in this case the distribution of human generated text (encoded with tokens) and the empirical distribution generated by the model. I believe with high probability that this measure is relevant, ie we can only get to a low enough cross entropy loss when the model is capable of doing human comparable intellectual work (irrespective of it actually doing it).
After the model achieves the necessary cross entropy loss and consequently becomes capable somewhere in it to produce agi level work (as per 2.), we can get the model to output that level of work with minor tweaks (I don’t have specifics, but think on the level of letting the model to recusrively call itself on some generated text with a special output command or some such)
I don’t think prompt engineering is relevant to agi.
I would be glad for any information that can help me update.