Leon Lang comments on Transformers Represent Belief State Geometry in their Residual Stream

Leon Lang 17 Apr 2024 17:17 UTC
7 points
3
I really enjoyed reading this post! It’s quite well-written. Thanks for writing it.

The only critique is that I would have appreciated more details on how the linear regression parameters are trained and what exactly the projection is doing. John’s thread is a bit clarifying on this.
One question: If you optimize the representation in the residual stream such that it corresponds to a particular chosen belief state, does the transformer than predict the next token as if in that belief state? I.e., does the transformer use the belief state for making predictions?
- Adam Shai 17 Apr 2024 18:26 UTC
  1 point
  0
  Parent
  Thanks! I appreciate the critique. From this comment and from John’s it seems correct and I’ll keep it in mind for the future.
  On the question, by optimize the representation do you mean causally intervene on the residual stream during inference (e.g. a patching experiment)? Or do you mean something else that involves backprop? If the first, then we haven’t tried, but definitely want to! It could be something someone does at the Hackathon, if interested ;)
  - Leon Lang 17 Apr 2024 20:46 UTC
    1 point
    0
    Parent
    Yes the first! Thanks for the link!