Isn’t that math wrong? 17 trillion parameters is 100x more than GPT-3, so the cost should be at least 100x higher, so if the cost is $12M now it should be at least a billion dollars. I think it would be about $3B. It would probably cost a bit more than that since the scaling law will probably bend soon and more data will be needed per parameter. Also there may be inefficiencies from doing things at that scale. I’d guess maybe $10B give or take an order of magnitude.
You are absolutely correct, the cost must be more than 100x if costs scales faster than number of parameters. I have now updated the calculations and got a 727 increase of costs.
Maybe the relevant sort of AI system won’t just stream-of-consciousness generate words like GPT-3 does, but rather be some sort of internal bureaucracy of prompt programming that e.g. takes notes to itself, spins of sub-routines to do various tasks like looking up facts, reviews and edits text before finalizing the product, etc. such that 10x or even 100x compute is spent per word of generated text. This would mean $1 - $10 per 700 words generated, which is maybe enough to be outcompeted by humans for many applications.
I suspect you might be right. If we imagine the human brain, every neuron is reused tens of times by the time it takes to say a single word, so I it doesn’t seem unlikely that a good architecture reuses neurons multiple times before “outputting” something. So I think an increase as you say with about 10-100x is not unlikely.
You are absolutely correct, the cost must be more than 100x if costs scales faster than number of parameters. I have now updated the calculations and got a 727 increase of costs.
I suspect you might be right. If we imagine the human brain, every neuron is reused tens of times by the time it takes to say a single word, so I it doesn’t seem unlikely that a good architecture reuses neurons multiple times before “outputting” something. So I think an increase as you say with about 10-100x is not unlikely.