SoerenMind comments on Inference cost limits the impact of ever larger models

SoerenMind 27 Oct 2021 16:30 UTC
1 point
Perhaps what you meant is that latency will be high but this isn’t a problem as long as you have high throughput. That’s is basically true for training. But this post is about inference where latency matters a lot more.

(It depends on the application of course, but the ZeRO Infinity approach can make your model so slow that you don’t want to interact with it in real time, even at GPT-3 scale)