Is it accurate to summarize the headline result as follows?
Train a Transformer to predict next tokens on a distribution generated from an HMM.
One optimal predictor for this data would be to maintain a belief over which of the three HMM states we are in, and perform Bayesian updating on each new token. That is, it maintains p(hidden state=Hi).
Key result: A linear probe on the residual stream is able to reconstruct p(hidden state=Hi).
(I don’t know what Computational Mechanics or MSPs are so this could be totally off.)
Part of what this all illustrates is that the fractal shape is kinda… baked into any Bayesian-ish system tracking the hidden state of the Markov model. So in some sense, it’s not very surprising to find it linearly embedded in activations of a residual stream; all that really means is that the probabilities for each hidden state are linearly represented in the residual stream.
Is it accurate to summarize the headline result as follows?
Train a Transformer to predict next tokens on a distribution generated from an HMM.
One optimal predictor for this data would be to maintain a belief over which of the three HMM states we are in, and perform Bayesian updating on each new token. That is, it maintains p(hidden state=Hi).
Key result: A linear probe on the residual stream is able to reconstruct p(hidden state=Hi).
(I don’t know what Computational Mechanics or MSPs are so this could be totally off.)
EDIT: Looks like yes. From this post:
That is a fair summary.
As well as inferring the HMM itself from the data.