Charles Wang has just posted a short tweet thread which begins like this:
Next-token-prediction is an appropriate framing in LLM pretraining) but a misframimg at inference because it doesn’t capture what’s actually happening, which is about that which gives rise to the next token.
Charles Wang has just posted a short tweet thread which begins like this: