Because there is more data available for FP32, so it’s easier to study trends there.
We should release a piece soon about how the picture changes when you account for different number formats, plus considering that most runs happen with hardware that is not the most cost-efficient.
Why measure flops at FP32? All the big training runs in the last 2 years are FP16 right?
Because there is more data available for FP32, so it’s easier to study trends there.
We should release a piece soon about how the picture changes when you account for different number formats, plus considering that most runs happen with hardware that is not the most cost-efficient.