Vladimir_Nesov comments on Andrew Burns’s Shortform

Vladimir_Nesov 26 Jun 2024 14:35 UTC
15 points
0

Papers like the one involving elimination of matrix-multiplication suggest that there is no need for warehouses full of GPUs to train advanced AI systems.

The paper is about getting rid of multiplication in inference, not in training (specifically, in focuses on attention rather than MLP). Quantization aware training creates models with extreme levels of quantization that are not much worse than full precision models (this is currently impossible to do post-training, if training itself wasn’t built around targeting this outcome). The important recent result is ternary quantization where weights in MLP become {-1, 0, 1}, and thus multiplication by such a matrix no longer needs multiplication by weights. So this is relevant for making inference cheaper or running models locally.
What links here?
- Vladimir_Nesov's comment on Yitz’s Shortform by Yitz (27 Jun 2024 18:27 UTC; 7 points)
- Andrew Burns 26 Jun 2024 17:46 UTC
  1 point
  0
  Parent
  Good point.