Benjamin_Todd comments on How much AI inference can we do?

Benjamin_Todd 18 May 2024 18:31 UTC
1 point
0
I agree the lower bound for output isn’t very tight. I’d be very interested to hear other simple rules of thumb you could use to provide a tighter one.
I’ll add a note to the section on input tokens that since they don’t require KV cache, it’s possible to get much closer to the upper bound.