I get 1e7 using 16 bit-flips per bfloat16 operation, 300K operating temperature, and 312Tflop/s (from Nvidia’s spec sheet). My guess is that this is a little high because a float multiplication involves more operations than just flipping 16 bits, but it’s the right order-of-magnitude.
I get 1e7 using 16 bit-flips per bfloat16 operation, 300K operating temperature, and 312Tflop/s (from Nvidia’s spec sheet). My guess is that this is a little high because a float multiplication involves more operations than just flipping 16 bits, but it’s the right order-of-magnitude.