johnswentworth comments on Transformers Represent Belief State Geometry in their Residual Stream

johnswentworth Apr 17, 2024, 9:04 PM
15 points
0
We’re now working through understanding all the pieces of this, and we’ve calculated an MSP which doesn’t quite look like the one in the post:
(Ignore the skew, David’s still fiddling with the projection into 2D. The important noticeable part is the absence of “overlap” between the three copies of the main shape, compared to the fractal from the post.)
Specifically, each point in that visual corresponds to a distribution $(P [H^{t} = H_{0} | O^{< t}], P [H^{t} = H_{1} | O^{< t}], P [H^{t} = H_{2} | O^{< t}])$ for some value of the observed symbols $O$ . The image itself is of the points on the probability simplex. From looking at a couple of Crutchfield papers, it sounds like that’s what the MSP is supposed to be.
The update equations are:
- $P [H^{t + 1} | O^{\leq t}] = \sum_{H^{t}} P [H^{t + 1} | H^{t}] P [H^{t} | O^{\leq t}]$
- $P [H^{t} | O^{\leq t}] = \frac{1}{Z} P [O^{t} | H^{t}] P [H^{t} | O^{< t}]$
with $P [H^{t + 1} | H^{t}]$ given by the transition probabilities, $P [O^{t} | H^{t}]$ given by the observation probabilities, and $Z$ a normalizer. We generate the image above by running initializing some random distribution $P [H^{0}]$ , then iterating the equations and plotting each point.
Off the top of your head, any idea what might account for the mismatch (other than a bug in our code, which we’re already checking)? Are we calculating the right thing, i.e. values of $(P [H^{t} = H_{0} | O^{< t}], P [H^{t} = H_{1} | O^{< t}], P [H^{t} = H_{2} | O^{< t}])$ ? Are the transition and observation probabilities from the graphic in the post the same parameters used to generate the fractal? Is there some thing which people always forget to account for when calculating these things?
- Adam Shai Apr 17, 2024, 10:03 PM
  10 points
  0
  Parent
  Everything looks right to me! This is the annoying problem that people forget to write the actual parameters they used in their work (sorry).
  Try x=0.05, alpha=0.85. I’ve edited the footnote with this info as well.
  - johnswentworth Apr 17, 2024, 10:25 PM
    3 points
    0
    Parent
    Yup, that was it, thankyou!
- Adam Shai Apr 17, 2024, 10:13 PM
  5 points
  0
  Parent
  Oh wait one thing that looks not quite right is the initial distribution. Instead of starting randomly we begin with the optimal initial distribution, which is the steady-state distribution. Can be computed by finding the eigenvector of the transition matrix that has an eigenvalue of 1. Maybe in practice that doesn’t matter that much for mess3, but in general it could.
  - Jett Janiak May 17, 2024, 9:47 AM
    1 point
    0
    Parent
    For the two sets of mess3 parameters I checked the stationary distribution was uniform.