Apparently Blackwell supports Microscaling, a block number format where multiple numbers share a scaling factor, and 4-6 bit Microscaling can be used for training (not just inference) as a drop-in replacement for FP32 (see page 7). For inference, models created with quantization-aware training (as opposed to being quantized post-training) are approximately as good as models in high precision (for the same training data and number of parameters).
So appeals to FP4/FP6 performance are not empty marketing, it seems to have an actual moderately straightforward use.
Apparently Blackwell supports Microscaling, a block number format where multiple numbers share a scaling factor, and 4-6 bit Microscaling can be used for training (not just inference) as a drop-in replacement for FP32 (see page 7). For inference, models created with quantization-aware training (as opposed to being quantized post-training) are approximately as good as models in high precision (for the same training data and number of parameters).
So appeals to FP4/FP6 performance are not empty marketing, it seems to have an actual moderately straightforward use.