paulfchristiano comments on Inference cost limits the impact of ever larger models

paulfchristiano 24 Oct 2021 21:18 UTC
6 points
That estimate puts GPT-3 at about 500 billion floating point operations per word, 200x less than 100 trillion. If you think a human reads at 250 words per minute, then 6 cents for 750 words is $1.20/hour. So the two estimates differ by about 250x.
As a citation for the hardware cost:
- P4d instances on EC2 currently cost $11.57/h if reserved for 3 years. They contain 8 A100s.
- An A100 does about 624 trillion half-precision ops/second.
- So that’s 430 trillion (operations per second) per ($/hour).
- You shouldn’t expect to be able to get full utilization out of that for a variety of reasons, but in the very long run you should be getting reasonably close, certainly more than 100 trillion operations per second.
(ETA: But note that a service like the OpenAI API using EC2 would need to use on demand prices which are about 10x higher per flop if you want reasonable availability.)