gwern comments on Steering Behaviour: Testing for (Non-)Myopia in Language Models

gwern 19 Oct 2023 23:44 UTC
3 points
0
Some Twitter discussion: https://twitter.com/saprmarks/status/1715100934936854691
- gwern 11 Nov 2023 17:40 UTC
  7 points
  0
  Parent
  Recent papers demonstrating LLMs are not myopic and you can extract predictions of tokens beyond the next token:
  - “Eliciting Latent Predictions from Transformers with the Tuned Lens”, Belrose et al 2023
  - “Jump to Conclusions: Short-Cutting Transformers With Linear Transformations”, Din et al 2023
  - “Future Lens: Anticipating Subsequent Tokens from a Single Hidden State”, Pal et al 2023