SoerenMind comments on Inference cost limits the impact of ever larger models

SoerenMind 24 Oct 2021 18:24 UTC
3 points
You may have better info, but I’m not sure I expect 1000x better serial speed than humans (at least not with innovations in the next decade). Latency is already a bottleneck in practice, despite efforts to reduce it. Width-wise parallelism has its limits and depth- or data-wise parallelism doesn’t improve latency. For example, GPT-3 already has high latency compared to smaller models and it won’t help if you make it 10^3x or 10^6x bigger.