I once read an estimate that 5x-ing the size of GPT3, would 10x the training cost. I can’t find the source now, so if anyone has a better estimate, please let me know.
Even if this is true,
(Note: Thank you Daniel Kokotajlo for pointing out my math mistake, I have updated the calculations as of below)
Number of 5 doubles needed to 100x parameters: 5^n=100 ⇒ n=log(100)/log(5)=2.86
Increase of training cost: 10^2.86=727
Training a 17 trillion parameter GPT model would then cost around 727 times more than the cost of GPT-3, which would give a cost of around 8,7 billion USD. Of course bigger models alone without software improvements might not result in transformative AI.
is this true? Or could increasing the size of a larger model, cost more, i.e. 5x-ing GPT3 is 10x, but 5x-ing again costs more that 10x?
That’s a good question. I can see both the scenario of price increasing both more or less than that.
The compute needed for the training is in this example the only significant factor in price, and that’s the one that scales at 10x cost for 5x size. (However, sadly I can’t find the source where I read it, so again, please feel free to share if someone has a better method for estimation.)
Building the infrastructure for training a model of several trillion parameters could easily create a chip shortage, drastically increasing costs of AI chips, and thus leading to training costing way more than the estimate.
However, it might be possible that building a huge infrastructure would have many benefits of scale. For example Google might build a TPU “gigafactory” and because of the high volume of produced TPUs, the price per TPU would decrease significantly.
Building the infrastructure for training a model of several trillion parameters could easily create a chip shortage, drastically increasing costs of AI chips, and thus leading to training costing way more than the estimate.
Doing this slowly might be the sort of increased demand that would increase investment, and decrease price.
Even if this is true,
is this true? Or could increasing the size of a larger model, cost more, i.e. 5x-ing GPT3 is 10x, but 5x-ing again costs more that 10x?
That’s a good question. I can see both the scenario of price increasing both more or less than that.
The compute needed for the training is in this example the only significant factor in price, and that’s the one that scales at 10x cost for 5x size. (However, sadly I can’t find the source where I read it, so again, please feel free to share if someone has a better method for estimation.)
Building the infrastructure for training a model of several trillion parameters could easily create a chip shortage, drastically increasing costs of AI chips, and thus leading to training costing way more than the estimate.
However, it might be possible that building a huge infrastructure would have many benefits of scale. For example Google might build a TPU “gigafactory” and because of the high volume of produced TPUs, the price per TPU would decrease significantly.
Doing this slowly might be the sort of increased demand that would increase investment, and decrease price.