Rohin Shah comments on Transformers Represent Belief State Geometry in their Residual Stream

Rohin Shah 19 Apr 2024 7:59 UTC
LW: 56 AF: 33
12
AF
Is it accurate to summarize the headline result as follows?
- Train a Transformer to predict next tokens on a distribution generated from an HMM.
- One optimal predictor for this data would be to maintain a belief over which of the three HMM states we are in, and perform Bayesian updating on each new token. That is, it maintains $p (hidden state = H_{i})$ .
- Key result: A linear probe on the residual stream is able to reconstruct $p (hidden state = H_{i})$ .
(I don’t know what Computational Mechanics or MSPs are so this could be totally off.)
EDIT: Looks like yes. From this post:
Part of what this all illustrates is that the fractal shape is kinda… baked into any Bayesian-ish system tracking the hidden state of the Markov model. So in some sense, it’s not very surprising to find it linearly embedded in activations of a residual stream; all that really means is that the probabilities for each hidden state are linearly represented in the residual stream.
- Adam Shai 19 Apr 2024 17:08 UTC
  LW: 11 AF: 7
  3
  AF Parent
  That is a fair summary.
- eggsyntax 19 Apr 2024 17:20 UTC
  3 points
  2
  Parent
  One optimal predictor for this data would be to maintain a belief over which of the three HMM states we are in
  As well as inferring the HMM itself from the data.