For all practical purposes, it takes O(N+M) compute to generate N tokens from an M token context
Yup. I suspect that’s close to the root of the confusion/apparent disagreement earlier- when I say constant time, I mean constant with respect to input, given a particular model and bounded context window, for a single token.
I think doing the analysis at this level is often more revealing than doing the analysis across full trajectories or across arbitrary windows in an important way: a tight bound makes it easier to make claims about what’s possible by existence proof (which turns out to be a lot).
Yup. I suspect that’s close to the root of the confusion/apparent disagreement earlier- when I say constant time, I mean constant with respect to input, given a particular model and bounded context window, for a single token.
I think doing the analysis at this level is often more revealing than doing the analysis across full trajectories or across arbitrary windows in an important way: a tight bound makes it easier to make claims about what’s possible by existence proof (which turns out to be a lot).