I agree the lower bound for output isn’t very tight. I’d be very interested to hear other simple rules of thumb you could use to provide a tighter one.
I’ll add a note to the section on input tokens that since they don’t require KV cache, it’s possible to get much closer to the upper bound.
I agree the lower bound for output isn’t very tight. I’d be very interested to hear other simple rules of thumb you could use to provide a tighter one.
I’ll add a note to the section on input tokens that since they don’t require KV cache, it’s possible to get much closer to the upper bound.