Some Twitter discussion: https://twitter.com/saprmarks/status/1715100934936854691
Recent papers demonstrating LLMs are not myopic and you can extract predictions of tokens beyond the next token:
“Eliciting Latent Predictions from Transformers with the Tuned Lens”, Belrose et al 2023
“Jump to Conclusions: Short-Cutting Transformers With Linear Transformations”, Din et al 2023
“Future Lens: Anticipating Subsequent Tokens from a Single Hidden State”, Pal et al 2023
Some Twitter discussion: https://twitter.com/saprmarks/status/1715100934936854691
Recent papers demonstrating LLMs are not myopic and you can extract predictions of tokens beyond the next token:
“Eliciting Latent Predictions from Transformers with the Tuned Lens”, Belrose et al 2023
“Jump to Conclusions: Short-Cutting Transformers With Linear Transformations”, Din et al 2023
“Future Lens: Anticipating Subsequent Tokens from a Single Hidden State”, Pal et al 2023