I recently estimated the training cost of PaLM to be around $9M to $17M. Please note all the caveats and this is only estimating the final training run costs using commercial cloud computing (Google’s TPUv3).
As already, said a 10T parameter model using Chinchilla scaling laws would be around 1.3×1028 FLOPs. That’s 5200x more compute than PaLM (2.5×1024 FLOPs).
Therefore, 5200×[$9M;$17M]=[$46.8B;$88.4B].
So a conservative estimate is around $47 to $88 billion.
It took Google 64 days to train PaLM using more than 6′000 TPU chips. Using the same setup (which is probably one of the most interconnected and capable ML training systems out there), it’d take912 years.
I recently estimated the training cost of PaLM to be around $9M to $17M.
Please note all the caveats and this is only estimating the final training run costs using commercial cloud computing (Google’s TPUv3).
As already, said a 10T parameter model using Chinchilla scaling laws would be around 1.3×1028 FLOPs. That’s 5200x more compute than PaLM (2.5×1024 FLOPs).
Therefore, 5200×[$9M;$17M]=[$46.8B;$88.4B].
So a conservative estimate is around $47 to $88 billion.
It took Google 64 days to train PaLM using more than 6′000 TPU chips. Using the same setup (which is probably one of the most interconnected and capable ML training systems out there), it’d take 912 years.