I think GPT-3 is the trigger for 100x larger projects at Google, Facebook and the like, with timelines measured in months.
My impression is that this prediction has turned out to be mistaken (though it’s kind of hard to say because “measured in months” is pretty ambiguous.) There have been models with many-fold the number of parameters (notably one by Google*) but it’s clear that 9 months after this post, there haven’t been publicised efforts that use close to 100x the amount of compute of GPT-3. I’m curious to know whether and how the author (or others who agreed with the post) have changed their mind about the overhang and related hypotheses recently, in light of some of this evidence failing to pan out the way the author predicted.
Nine months later I consider my post pretty ‘shrill’, for want of a better adjective. I regret not making more concrete predictions at the time, because yeah, reality has substantially undershot my fears. I think there’s still a substantial chance of something 10x large being revealed within 18 months (which I think is the upper bound on ‘timeline measured in months’), but it looks very unlikely that there’ll be a 100x increase in that time frame.
To pick one factor I got wrong in writing the above, it was thinking of my massive update in response to GPT-3 as somewhere near to the median, rather than a substantial outlier. As another example of this, I am the only person I know of who, after GPT-3, dropped everything they were doing to re-orient their career towards AI safety. And that’s within circles of people who you’d think would be primed to respond similarly!
I still think AI projects could be run at vastly larger budgets, so in that sense I still believe in there being an orders-of-magnitude overhang. Just convincing the people with those budgets to fund these projects is apparently much harder than I thought.
Curious if you have any other thoughts on this after another 10 months?
Those I know who train large models seem to be very confident we will get 100 Trillion parameter models before the end of the decade, but do not seem to think it will happen, say, in the next 2 years.
There is a strange disconcerting phenomena where many of the engineers I’ve talked to most in the position to know, who work for (and in one case owns) companies training 10 billion+ models, seem to have timelines on the order of 5-10 years. Shane Legg recently said he gave a 50% chance of AGI by 2030, which is inline with some the people I’ve talked to on EAI, though many disagree. Leo Gao, I believe, tends to think OpenPhil’s more aggressive estimates are about right, which is less short than some.
I would like “really short timelines” people to make more posts about it, assuming common knowledge of short timelines is a good thing, as the position is not talked about here as much as it should be given how many people seem to believe in it.
For what it’s worth I settled on the Ajeya report aggressive distribution as a reasonable prior after taking a quick skim of the report and then eyeballing the various distributions to see which one felt the most right to me—not a super rigorous process. The best guess timeline feels definitely too slow to me. The biggest reason why my timeline estimate isn’t shorter is essentially correction for planning fallacy.
Those I know who train large models seem to be very confident we will get 100 Trillion parameter models before the end of the decade, but do not seem to think it will happen, say, in the next 2 years.
>I think there’s still a substantial chance of something 10x large being revealed within 18 months (which I think is the upper bound on ‘timeline measured in months’)
My impression is that this prediction has turned out to be mistaken (though it’s kind of hard to say because “measured in months” is pretty ambiguous.) There have been models with many-fold the number of parameters (notably one by Google*) but it’s clear that 9 months after this post, there haven’t been publicised efforts that use close to 100x the amount of compute of GPT-3. I’m curious to know whether and how the author (or others who agreed with the post) have changed their mind about the overhang and related hypotheses recently, in light of some of this evidence failing to pan out the way the author predicted.
*https://arxiv.org/abs/2101.03961
Nine months later I consider my post pretty ‘shrill’, for want of a better adjective. I regret not making more concrete predictions at the time, because yeah, reality has substantially undershot my fears. I think there’s still a substantial chance of something 10x large being revealed within 18 months (which I think is the upper bound on ‘timeline measured in months’), but it looks very unlikely that there’ll be a 100x increase in that time frame.
To pick one factor I got wrong in writing the above, it was thinking of my massive update in response to GPT-3 as somewhere near to the median, rather than a substantial outlier. As another example of this, I am the only person I know of who, after GPT-3, dropped everything they were doing to re-orient their career towards AI safety. And that’s within circles of people who you’d think would be primed to respond similarly!
I still think AI projects could be run at vastly larger budgets, so in that sense I still believe in there being an orders-of-magnitude overhang. Just convincing the people with those budgets to fund these projects is apparently much harder than I thought.
I am not unhappy about this.
Curious if you have any other thoughts on this after another 10 months?
Those I know who train large models seem to be very confident we will get 100 Trillion parameter models before the end of the decade, but do not seem to think it will happen, say, in the next 2 years.
There is a strange disconcerting phenomena where many of the engineers I’ve talked to most in the position to know, who work for (and in one case owns) companies training 10 billion+ models, seem to have timelines on the order of 5-10 years. Shane Legg recently said he gave a 50% chance of AGI by 2030, which is inline with some the people I’ve talked to on EAI, though many disagree. Leo Gao, I believe, tends to think OpenPhil’s more aggressive estimates are about right, which is less short than some.
I would like “really short timelines” people to make more posts about it, assuming common knowledge of short timelines is a good thing, as the position is not talked about here as much as it should be given how many people seem to believe in it.
For what it’s worth I settled on the Ajeya report aggressive distribution as a reasonable prior after taking a quick skim of the report and then eyeballing the various distributions to see which one felt the most right to me—not a super rigorous process. The best guess timeline feels definitely too slow to me. The biggest reason why my timeline estimate isn’t shorter is essentially correction for planning fallacy.
FWIW if the current trend continues we will first see 1e14 parameter models in 2 to 4 years from now.
>I think there’s still a substantial chance of something 10x large being revealed within 18 months (which I think is the upper bound on ‘timeline measured in months’)
So did that happen?
I suppose the new scaling laws render this sort of thinking obsolete.