They just made an experimental “long output” up to 64K output tokens per request available for “alpha users”, and here was what they did for pricing https://openai.com/gpt-4o-long-output/:
Long completions are more costly from an inference perspective, so the per-token pricing of this model is increased to match the costs.
Thanks. I think I get it now. (at least one of) my confusion was something between confusing a “transformer run” and “number of FLOPS”.
And I get the thing about cost, that’s what I meant but I articulated it poorly.
An extra recent observation point: currently GPT-4o cost is $5.00 / 1M input tokens and $15.00 / 1M output tokens https://openai.com/api/pricing/
They just made an experimental “long output” up to 64K output tokens per request available for “alpha users”, and here was what they did for pricing https://openai.com/gpt-4o-long-output/:
Interesting, thanks!