Ajeya Cotra estimates that the total cost of training large deep learning models is likely between 10 to 100 times the reported cost of the final training run. Combining it with the number of large models trained from this spreadsheet, I think we end up with 1 to 10 billion dollars spent on training these models in total in 2021....The Metaculus community currently forecasts that the GPT line of language models making 1 billion dollars of customer revenue over a period of four to five years is the median scenario. Even if this whole revenue was purely due to GPT-3, Cotra’s cost multiplier implies that the total cost of creating GPT-3 was anywhere from 100 million to 1 billion dollars.
Nitpick: It obviously wasn’t, though. OA didn’t even have $1b to spend on just GPT-3, come on. (That $1b MS investment you’re thinking of came afterwards, and anyway, Sam Altman is on record somewhere, maybe one of the SSC Q&As, as ballparking total GPT-3 costs at like $20m, I think, in response to people trying to estimate the compute cost of the final model at ~$10m. Certainly not anywhere close to $1000m.) That multiplier is assuming old-style non-scaling-law based research, like training a full-scale OA5 and then retraining it again from scratch; but what Cotra says about researchers transitioning methods has already happened. What they actually did for GPT-3 was train a bunch of tiny models, like 0.1b-parameter models, to fit the scaling laws in Kaplan, and then do a single 173b-parameter run, so the final model winds up being something like 90% of the compute—the prior work becomes a rounding error. A 2-3x multiplier would be much saner. And this applies to all of those other models too. You think GB trained 10 PaLMs or Nvidia/MS trained 10 Megatron-Turing NLGs, or pilot runs equivalent to? No, of course not.
Compared to the ~ 200 million dollars of estimated annual revenue, this means that right now the annual revenue generated by AI models is somewhat less than the cost of investment that goes into them.
Since you’re overestimating many models’ costs by factors of like 97x, this means that the revenue generated by AI models is substantially more than their costs.
I’ve asked numerous people about this while writing the essay but nobody told me that these estimates were off, so I went ahead and used them. I should have known to get in touch with you instead.
I think Cotra says that the main cost of training these models is not in compute but in researcher time & effort. How many researcher-hours would you estimate went into designing and training GPT-3? An estimate of $20 million of cost for this means for a team of 100 people (roughly the number of employees at OpenAI in 2020) working at $200/hr you estimate maybe ~ 120 full time days of work per person on the model. That sounds too low to me, but you’re likely better informed about this, and a factor of 3 or so isn’t going to change much in this context anyway.
One problem I’ve had is that while I mention GPT-3 because it’s the only explicit example of a model about which I could find revenue estimates, I think these hyperbolic growth laws work much better with total investment into an area, because you still benefit from scale effects when working in a field even if you make up only a small fraction of the spending. It was too much work for this piece, but I would really like there to be a big spreadsheet of all large models trained in recent years along with estimates of both training costs and estimates of revenue generated by them.
In the end to manage this I end up going with estimates of AI investment & revenue that are on the high end, in the sense that they are for very general notions of “AI” which aren’t representative of what the frontier of research actually looks like. I would have preferred to use estimates from this hypothetical spreadsheet if it existed but unfortunately it didn’t.
Nitpick: It obviously wasn’t, though. OA didn’t even have $1b to spend on just GPT-3, come on. (That $1b MS investment you’re thinking of came afterwards, and anyway, Sam Altman is on record somewhere, maybe one of the SSC Q&As, as ballparking total GPT-3 costs at like $20m, I think, in response to people trying to estimate the compute cost of the final model at ~$10m. Certainly not anywhere close to $1000m.) That multiplier is assuming old-style non-scaling-law based research, like training a full-scale OA5 and then retraining it again from scratch; but what Cotra says about researchers transitioning methods has already happened. What they actually did for GPT-3 was train a bunch of tiny models, like 0.1b-parameter models, to fit the scaling laws in Kaplan, and then do a single 173b-parameter run, so the final model winds up being something like 90% of the compute—the prior work becomes a rounding error. A 2-3x multiplier would be much saner. And this applies to all of those other models too. You think GB trained 10 PaLMs or Nvidia/MS trained 10 Megatron-Turing NLGs, or pilot runs equivalent to? No, of course not.
Since you’re overestimating many models’ costs by factors of like 97x, this means that the revenue generated by AI models is substantially more than their costs.
I’ve asked numerous people about this while writing the essay but nobody told me that these estimates were off, so I went ahead and used them. I should have known to get in touch with you instead.
I think Cotra says that the main cost of training these models is not in compute but in researcher time & effort. How many researcher-hours would you estimate went into designing and training GPT-3? An estimate of $20 million of cost for this means for a team of 100 people (roughly the number of employees at OpenAI in 2020) working at $200/hr you estimate maybe ~ 120 full time days of work per person on the model. That sounds too low to me, but you’re likely better informed about this, and a factor of 3 or so isn’t going to change much in this context anyway.
One problem I’ve had is that while I mention GPT-3 because it’s the only explicit example of a model about which I could find revenue estimates, I think these hyperbolic growth laws work much better with total investment into an area, because you still benefit from scale effects when working in a field even if you make up only a small fraction of the spending. It was too much work for this piece, but I would really like there to be a big spreadsheet of all large models trained in recent years along with estimates of both training costs and estimates of revenue generated by them.
In the end to manage this I end up going with estimates of AI investment & revenue that are on the high end, in the sense that they are for very general notions of “AI” which aren’t representative of what the frontier of research actually looks like. I would have preferred to use estimates from this hypothetical spreadsheet if it existed but unfortunately it didn’t.