boubounet comments on Memory bandwidth constraints imply economies of scale in AI inference

boubounet 18 Sep 2023 16:22 UTC
4 points
2
On a different topic but answering to the same quote : advancements in quantization of models to significantly reduce model memory consumption for inference without reducing model performance might also mitigate the imbalance between ALU ops and memory bandwith. This might only shift the problem a few orders of magnitude away, but still, I think it‘s worth mentioning.