If I understand correctly, the next-token prediction of Mess3 is related to the current-state prediction by a nonsingular linear transformation. So a linear probe showing “the meta-structure of an observer’s belief updates over the hidden states of the generating structure” is equivalent to one showing “the structure of the next-token predictions”, no?
I suppose if you had more hidden states than observables, you could distinguish hidden-state prediction from next-token prediction by the dimension of the fractal.
If I understand correctly, the next-token prediction of Mess3 is related to the current-state prediction by a nonsingular linear transformation. So a linear probe showing “the meta-structure of an observer’s belief updates over the hidden states of the generating structure” is equivalent to one showing “the structure of the next-token predictions”, no?
I suppose if you had more hidden states than observables, you could distinguish hidden-state prediction from next-token prediction by the dimension of the fractal.