I’m honestly always amazed from just how much money some people in these parts seem to have. That’s a huge sum to spend on an LLM experiment. It would be pretty large even for a research group, to burn that in just 6 days!
The money is not what you should take away; there are many ways to spend money—IP licenses, software, interns, buildings, consultants… What you should take away is the compute that that money bought (given that profit margins for OA seem to be minimal, and the cost ~ compute).
Behind many key results, there is an unspoken level of compute. (Where did that KataGo or grokking result come from? They came from letting the GPUs run a lot longer than planned, either because of spare capacity or forgetfulness.) You will not find that in the papers, usually, but you should always remember this; it’s sorta like Anatole France - ‘behind every great result, there is a great compute’. And this is why compute drives AI progress, and always has, and has always been underestimated as a driver: Compute Rules Everything Around Me.
yeah he’s just listing API costs
I’m honestly always amazed from just how much money some people in these parts seem to have. That’s a huge sum to spend on an LLM experiment. It would be pretty large even for a research group, to burn that in just 6 days!
The money is not what you should take away; there are many ways to spend money—IP licenses, software, interns, buildings, consultants… What you should take away is the compute that that money bought (given that profit margins for OA seem to be minimal, and the cost ~ compute).
Behind many key results, there is an unspoken level of compute. (Where did that KataGo or grokking result come from? They came from letting the GPUs run a lot longer than planned, either because of spare capacity or forgetfulness.) You will not find that in the papers, usually, but you should always remember this; it’s sorta like Anatole France - ‘behind every great result, there is a great compute’. And this is why compute drives AI progress, and always has, and has always been underestimated as a driver: Compute Rules Everything Around Me.