We are also now offering dedicated instances for users who want deeper control over the specific model version and system performance. By default, requests are run on compute infrastructure shared with other users, who pay per request. Our API runs on Azure, and with dedicated instances, developers will pay by time period for an allocation of compute infrastructure that’s reserved for serving their requests.
Developers get full control over the instance’s load (higher load improves throughput but makes each request slower), the option to enable features such as longer context limits, and the ability to pin the model snapshot.
Dedicated instances can make economic sense for developers running beyond ~450M tokens per day.
that suggests one shared “instance” is capable of processing > 450M tokens per day, i.e. $900 of API fees at this new rate. i don’t know what exactly their infrastructure looks like, but the marginal costs of the compute here have got to be still an order of magnitude lower than what they’re charging (which is sensible: they do have fixed costs they have to recoup, and they are seeking to profit).
further down on that page:
that suggests one shared “instance” is capable of processing > 450M tokens per day, i.e. $900 of API fees at this new rate. i don’t know what exactly their infrastructure looks like, but the marginal costs of the compute here have got to be still an order of magnitude lower than what they’re charging (which is sensible: they do have fixed costs they have to recoup, and they are seeking to profit).