OpenAI’s prices seem too low to recoup even part of their capital costs in a reasonable time given the volatile nature of the AI industry. Surely I’m missing something obvious?
Yes: batching. Efficient GPU inference uses matrix matrix multiplication not vector matrix multiplication.
Yes: batching. Efficient GPU inference uses matrix matrix multiplication not vector matrix multiplication.