Jacob Pfau comments on Comparing Quantized Performance in Llama Models

Jacob Pfau 15 Jul 2024 16:20 UTC
3 points
0
It’s surprising to me that a model as heavily over-trained as LLAMA-3-8b can still be 4b quantized without noticeable quality drop. Intuitively (and I thought I saw this somewhere in a paper or tweet) I’d have expected over-training to significantly increase quantization sensitivity. Thanks for doing this!