Are these 2 bullet points faithful to your conclusion?
GPT-4 training run (renting the compute for the final run): 100M$, of which 1⁄3 to 2⁄3 is the cost of the staff
GPT-4 training run + building the supercomputer: 600M$, of which ~20% for cost of the staff
And some hot takes (mine):
Because supercomputers become “obsolete” quickly (~3 years), you need to run inferences to pay for building your supercomputer (you need profitable commercial applications), or your training cost must also account for the full cost of the supercomputer, and this produces a ~x6 increase in training cost.
In forecasting models, we may be underestimating the investment to be able to train a frontier model by ~x6 (closer to 600M$ in 2022 than 100M$).
The bottleneck to train new frontier models is now going to be building more powerful supercomputers.
More investments won’t help that much in solving this bottleneck.
This bottleneck will cause most capability gains to come from improving software efficiency.
Open-source models will stay close in terms of capability to frontier models.
This will reduce the profitability of simple and general commercial applications.
I think using the term”training run” in that first bullet point is misleading, and “renting the compute” is confusing since you can’t actually rent the compute just by having $60M, you likely need to have a multi-year contract.
I can’t tell if you’re attributing the hot takes to me? I do not endorse them.
Are these 2 bullet points faithful to your conclusion?
GPT-4 training run (renting the compute for the final run): 100M$, of which 1⁄3 to 2⁄3 is the cost of the staff
GPT-4 training run + building the supercomputer: 600M$, of which ~20% for cost of the staff
And some hot takes (mine):
Because supercomputers become “obsolete” quickly (~3 years), you need to run inferences to pay for building your supercomputer (you need profitable commercial applications), or your training cost must also account for the full cost of the supercomputer, and this produces a ~x6 increase in training cost.
In forecasting models, we may be underestimating the investment to be able to train a frontier model by ~x6 (closer to 600M$ in 2022 than 100M$).
The bottleneck to train new frontier models is now going to be building more powerful supercomputers.
More investments won’t help that much in solving this bottleneck.
This bottleneck will cause most capability gains to come from improving software efficiency.
Open-source models will stay close in terms of capability to frontier models.
This will reduce the profitability of simple and general commercial applications.
I think using the term”training run” in that first bullet point is misleading, and “renting the compute” is confusing since you can’t actually rent the compute just by having $60M, you likely need to have a multi-year contract.
I can’t tell if you’re attributing the hot takes to me? I do not endorse them.