I once read an estimate that 5x-ing the size of GPT3, would 10x the training cost. I can’t find the source now, so if anyone has a better estimate, please let me know. Which means training a 17 trillion parameter GPT model would cost around 25 times more than the cost of GPT-3, which would give a cost of around 300 million USD.
Isn’t that math wrong? 17 trillion parameters is 100x more than GPT-3, so the cost should be at least 100x higher, so if the cost is $12M now it should be at least a billion dollars. I think it would be about $3B. It would probably cost a bit more than that since the scaling law will probably bend soon and more data will be needed per parameter. Also there may be inefficiencies from doing things at that scale. I’d guess maybe $10B give or take an order of magnitude.
With 100x parameters, the cost for inference (using the model) would increase roughly 100x also. Giving a cost of about 0.1 USD per 700 words generated. This seems very cheap compared to human labour.
Some hopeful speculation (warning: Hope clouds observation):
Maybe the relevant sort of AI system won’t just stream-of-consciousness generate words like GPT-3 does, but rather be some sort of internal bureaucracy of prompt programming that e.g. takes notes to itself, spins of sub-routines to do various tasks like looking up facts, reviews and edits text before finalizing the product, etc. such that 10x or even 100x compute is spent per word of generated text. This would mean $1 - $10 per 700 words generated, which is maybe enough to be outcompeted by humans for many applications.
If multiplying the size by 5 means multiplying the cost by 10 (and if this relationship is consistent as size continues to increase) then a 100x size increase is about 2.86 5x-ings, which means a cost increase of about 10^2.86 or about 730, which means that $12M becomes about $8.7B. (Much more than the $3B you suggest and of course much much more than the $300M OP suggests.)
[EDITED to add:] Oops, should have refreshed the page before commenting; I see that OP has already fixed this.
Isn’t that math wrong? 17 trillion parameters is 100x more than GPT-3, so the cost should be at least 100x higher, so if the cost is $12M now it should be at least a billion dollars. I think it would be about $3B. It would probably cost a bit more than that since the scaling law will probably bend soon and more data will be needed per parameter. Also there may be inefficiencies from doing things at that scale. I’d guess maybe $10B give or take an order of magnitude.
You are absolutely correct, the cost must be more than 100x if costs scales faster than number of parameters. I have now updated the calculations and got a 727 increase of costs.
Maybe the relevant sort of AI system won’t just stream-of-consciousness generate words like GPT-3 does, but rather be some sort of internal bureaucracy of prompt programming that e.g. takes notes to itself, spins of sub-routines to do various tasks like looking up facts, reviews and edits text before finalizing the product, etc. such that 10x or even 100x compute is spent per word of generated text. This would mean $1 - $10 per 700 words generated, which is maybe enough to be outcompeted by humans for many applications.
I suspect you might be right. If we imagine the human brain, every neuron is reused tens of times by the time it takes to say a single word, so I it doesn’t seem unlikely that a good architecture reuses neurons multiple times before “outputting” something. So I think an increase as you say with about 10-100x is not unlikely.
Isn’t that math wrong? 17 trillion parameters is 100x more than GPT-3, so the cost should be at least 100x higher, so if the cost is $12M now it should be at least a billion dollars. I think it would be about $3B. It would probably cost a bit more than that since the scaling law will probably bend soon and more data will be needed per parameter. Also there may be inefficiencies from doing things at that scale. I’d guess maybe $10B give or take an order of magnitude.
Some hopeful speculation (warning: Hope clouds observation):
Maybe the relevant sort of AI system won’t just stream-of-consciousness generate words like GPT-3 does, but rather be some sort of internal bureaucracy of prompt programming that e.g. takes notes to itself, spins of sub-routines to do various tasks like looking up facts, reviews and edits text before finalizing the product, etc. such that 10x or even 100x compute is spent per word of generated text. This would mean $1 - $10 per 700 words generated, which is maybe enough to be outcompeted by humans for many applications.
If multiplying the size by 5 means multiplying the cost by 10 (and if this relationship is consistent as size continues to increase) then a 100x size increase is about 2.86 5x-ings, which means a cost increase of about 10^2.86 or about 730, which means that $12M becomes about $8.7B. (Much more than the $3B you suggest and of course much much more than the $300M OP suggests.)
[EDITED to add:] Oops, should have refreshed the page before commenting; I see that OP has already fixed this.
You are absolutely correct, the cost must be more than 100x if costs scales faster than number of parameters. I have now updated the calculations and got a 727 increase of costs.
I suspect you might be right. If we imagine the human brain, every neuron is reused tens of times by the time it takes to say a single word, so I it doesn’t seem unlikely that a good architecture reuses neurons multiple times before “outputting” something. So I think an increase as you say with about 10-100x is not unlikely.