Vladimir_Nesov comments on What o3 Becomes by 2028

Vladimir_Nesov 23 Dec 2024 22:18 UTC
4 points
0
Are there any signs to be found in public that anyone is training 10B+ LLMs in a precision that is not 16 bits? There are experiments that are specifically about precision on smaller LLMs, but they don’t seem to get adopted in practice for larger models, despite the obvious advantage of getting to 2x the compute.
- anaguma 27 Dec 2024 2:06 UTC
  1 point
  0
  Parent
  Deepseek v3 is one example, and semianalysis has claimed that most labs use FP8.
  
  FP8 Training is important as it speeds up training compared to BF16 & most frontier labs use FP8 Training.