The definition of “year Y compute requirements” is complicated in a kind of crucial way here, to attempt to a) account for the fact that you can’t take any amount of compute and turn it into a solution for some task literally instantly, while b) capturing that there still seems to be a meaningful notion of “the compute you need to do some task is decreasing over time.” I go into it in this section of part 1.
First we start with the “year Y technical difficulty of task T:”
In year Y, imagine a largeish team of good researchers (e.g. the size of AlphaGo’s team) is embarking on a dedicated project to solve task T.
They get an amount of $ D dumped on them, which could be more $ than exists in the whole world, like 10 quadrillion or whatever.
With a few years of dedicated effort (e.g. 2-5), plus whatever fungible resources they could buy with D dollars (e.g. terms of compute and data and low-skilled human labor), can that team of researchers produce a program that solves task T? Here we assume that the fungible resources are infinitely available if you pay, so e.g. if you pay a quadrillion dollars you can get an amount of compute that is (FLOP/$ in year Y) * (1 quadrillion), even though we obviously don’t have that many computers.
And the “technical difficulty of task T in year Y” is how big D is for the best plan that the researchers can come up with in that time. What I wrote in the doc was:
The price of the bundle of resources that it would take to implement the cheapest solution to T that researchers could have readily come up with by year Y, given the CS field’s understanding of algorithms and techniques at that time.
And then you have “year Y compute requirements,” which is whatever amount of compute they’d buy with whatever portion of D dollars they spend on compute.
This definition is convoluted, which isn’t ideal, but after thinking about it for ~10 hours it was the best I could do to balance a) and b) above.
With all that said, I actually do think that the team of good researchers could have gotten GPT-level perf with somewhat more compute a couple years ago, and AlphaGo-level perf with significantly more compute several years ago. I’m not sure exactly what ratio would be, but I don’t think it’s many OOMs.
The thing you said about it being an average with a lot of spread is also true. I think a better version of the model would have probability distributions over the algorithmic progress, hardware progress, and spend parameters; I didn’t put that in because the focus of the report was estimating the 2020 compute requirements distribution. I did try some different values for those parameters in my aggressive and conservative estimates but in retrospect the spread was not wide enough on those.
The definition of “year Y compute requirements” is complicated in a kind of crucial way here, to attempt to a) account for the fact that you can’t take any amount of compute and turn it into a solution for some task literally instantly, while b) capturing that there still seems to be a meaningful notion of “the compute you need to do some task is decreasing over time.” I go into it in this section of part 1.
First we start with the “year Y technical difficulty of task T:”
In year Y, imagine a largeish team of good researchers (e.g. the size of AlphaGo’s team) is embarking on a dedicated project to solve task T.
They get an amount of $ D dumped on them, which could be more $ than exists in the whole world, like 10 quadrillion or whatever.
With a few years of dedicated effort (e.g. 2-5), plus whatever fungible resources they could buy with D dollars (e.g. terms of compute and data and low-skilled human labor), can that team of researchers produce a program that solves task T? Here we assume that the fungible resources are infinitely available if you pay, so e.g. if you pay a quadrillion dollars you can get an amount of compute that is (FLOP/$ in year Y) * (1 quadrillion), even though we obviously don’t have that many computers.
And the “technical difficulty of task T in year Y” is how big D is for the best plan that the researchers can come up with in that time. What I wrote in the doc was:
And then you have “year Y compute requirements,” which is whatever amount of compute they’d buy with whatever portion of D dollars they spend on compute.
This definition is convoluted, which isn’t ideal, but after thinking about it for ~10 hours it was the best I could do to balance a) and b) above.
With all that said, I actually do think that the team of good researchers could have gotten GPT-level perf with somewhat more compute a couple years ago, and AlphaGo-level perf with significantly more compute several years ago. I’m not sure exactly what ratio would be, but I don’t think it’s many OOMs.
The thing you said about it being an average with a lot of spread is also true. I think a better version of the model would have probability distributions over the algorithmic progress, hardware progress, and spend parameters; I didn’t put that in because the focus of the report was estimating the 2020 compute requirements distribution. I did try some different values for those parameters in my aggressive and conservative estimates but in retrospect the spread was not wide enough on those.
Nice, thanks!