Oliver Sourbut comments on Transformers Represent Belief State Geometry in their Residual Stream

Oliver Sourbut 6 May 2024 9:18 UTC
1 point
0
the original ‘theorem’ was wordcelled nonsense

Lol! I guess if there was a more precise theorem statement in the vicinity gestured, it wasn’t nonsense? But in any case, I agree the original presentation is dreadful. John’s is much better.

I would be curious to hear a *precise * statement why the result here follows from the Good Regular Theorem.

A quick go at it, might have typos.

Suppose we have
- $X$ (hidden) state
- $Y$ output/observation
and a predictor
- $S$ (predictor) state
- $^Y$ predictor output
- $R$ the reward or goal or what have you (some way of scoring ‘was $^Y$ right?’)
with structure

$\begin{matrix} X & \to Y X & \to R Y \to S & \to^Y \to R \end{matrix}$

Then GR trivially says $S$ (predictor state) should model the posterior $P (X | Y)$ .

Now if these are all instead processes (time-indexed), we have HMM
- $X_{t}$ (hidden) states
- $Y_{t}$ observations
and predictor process
- $S_{t}$ (predictor) states
- ${^Y}_{t}$ predictions
- $R_{t}$ rewards
with structure

$\begin{matrix} X_{t} & \to X_{t + 1} X_{t} & \to Y_{t} S_{t - 1} & \to S_{t} Y_{t} \to S_{t} & \to {^Y}_{t + 1} \to R_{t + 1} Y_{t + 1} & \to R_{t + 1} \end{matrix}$

Drawing together $(X_{t + 1}, Y_{t + 1}, {^Y}_{t + 1}, R_{t + 1})$ as $G_{t}$ the ‘goal’, we have a GR motif

$\begin{matrix} X_{t} & \to Y_{t} Y_{t} & \to S_{t} \to G_{t} S_{t - 1} & \to S_{t} X_{t} & \to G_{t} \end{matrix}$

so $S_{t}$ must model $P (X_{t} | S_{t - 1}, Y_{t})$ ; by induction that is $P (X_{t} | S_{0}, Y_{1}, . . ., Y_{t})$ .