Steven Byrnes comments on Inference cost limits the impact of ever larger models

Steven Byrnes 24 Oct 2021 14:56 UTC
5 points
I’m trying to reconcile:

consider that writing 750 words with GPT-3 costs 6 cents.

vs

It costs well under $1/hour to rent hardware that performs 100 trillion operations per second. If a model using that much compute (something like 3 orders of magnitude more than gpt-3)...

That’s easy to reconcile! OpenAI is selling access to GPT-3 wayyyy above its own marginal hardware rental cost. Right? That would hardly be surprising; usually pricing decisions involve other things besides marginal costs, like price elasticity of demand, and capacity to scale up, and so on. (And/or OpenAI’s marginal costs includes things that are not hardware rental, e.g. human monitoring and approval processes.) But as soon as there’s some competition (especially competition from open-source projects) I expect price to rapidly approach the hardware rental cost (including electricity).

Someone can correct me if I’m misunderstanding.
- paulfchristiano 24 Oct 2021 21:18 UTC
  6 points
  Parent
  That estimate puts GPT-3 at about 500 billion floating point operations per word, 200x less than 100 trillion. If you think a human reads at 250 words per minute, then 6 cents for 750 words is $1.20/hour. So the two estimates differ by about 250x.
  As a citation for the hardware cost:
  - P4d instances on EC2 currently cost $11.57/h if reserved for 3 years. They contain 8 A100s.
  - An A100 does about 624 trillion half-precision ops/second.
  - So that’s 430 trillion (operations per second) per ($/hour).
  - You shouldn’t expect to be able to get full utilization out of that for a variety of reasons, but in the very long run you should be getting reasonably close, certainly more than 100 trillion operations per second.
  (ETA: But note that a service like the OpenAI API using EC2 would need to use on demand prices which are about 10x higher per flop if you want reasonable availability.)
- Pattern 24 Oct 2021 17:40 UTC
  2 points
  Parent
  Limitation:
  Cost of compute + addition to pricing for:
  a) Profit
  b) To recuperate costs from training or acquiring the model
  
  Having an additional feature, human monitoring/approval, does make things higher. (In principle maybe it could increase quality.)