Vladimir_Nesov comments on AI #56: Blackwell That Ends Well

Vladimir_Nesov 22 Mar 2024 2:20 UTC
4 points
0
Apparently Blackwell supports Microscaling, a block number format where multiple numbers share a scaling factor, and 4-6 bit Microscaling can be used for training (not just inference) as a drop-in replacement for FP32 (see page 7). For inference, models created with quantization-aware training (as opposed to being quantized post-training) are approximately as good as models in high precision (for the same training data and number of parameters).

So appeals to FP4/FP6 performance are not empty marketing, it seems to have an actual moderately straightforward use.
What links here?
- Vladimir_Nesov's comment on Self-Play By Analogy by Amica Terra (25 Mar 2024 6:48 UTC; 9 points)