fixed cost: 0-500k USD (depending whether you start from birth and the task they need to be trained)
variable cost: 10 − 1000 USD / day (depending whether you count their maintenance cost or the cost they charge)
So an AI currently seems more expensive to train, but less expensive to use (as might be obvious for most of you).
Of course, trained humans are better than GPT-3. And this comparison has other limitations. But I still find it interesting.
According to one estimate, training GPT-3 would cost at least $4.6 million. And to be clear, training deep learning models is not a clean, one-shot process. There’s a lot of trial and error and hyperparameter tuning that would probably increase the cost several-fold. (source)
and error and hyperparameter tuning that would probably increase the cost several-fold.
All of which was done on much smaller models and GPT-3 just scaled up existing settings/equations—they did their homework. That was the whole point of the scaling papers, to tell you how to train the largest cost-effective model without having to brute force it! I think OA may well have done a single run and people are substantially inflating the cost because they aren’t paying any attention to the background research or how the GPT-3 paper pointedly omits any discussion of hyperparameter tuning and implies only one run (eg the dataset contamination issue).
Topic: AI adoption dynamic
GPT-3:
fixed cost: 4.6M USD
variable cost: 790 requests/USD source
Human:
fixed cost: 0-500k USD (depending whether you start from birth and the task they need to be trained)
variable cost: 10 − 1000 USD / day (depending whether you count their maintenance cost or the cost they charge)
So an AI currently seems more expensive to train, but less expensive to use (as might be obvious for most of you).
Of course, trained humans are better than GPT-3. And this comparison has other limitations. But I still find it interesting.
x-post: https://www.facebook.com/mati.roy.09/posts/10158882964819579
All of which was done on much smaller models and GPT-3 just scaled up existing settings/equations—they did their homework. That was the whole point of the scaling papers, to tell you how to train the largest cost-effective model without having to brute force it! I think OA may well have done a single run and people are substantially inflating the cost because they aren’t paying any attention to the background research or how the GPT-3 paper pointedly omits any discussion of hyperparameter tuning and implies only one run (eg the dataset contamination issue).
Good to know, thanks!