Nisan comments on Transformers Represent Belief State Geometry in their Residual Stream

Nisan 18 Apr 2024 17:05 UTC
3 points
0
I suppose if you had more hidden states than observables, you could distinguish hidden-state prediction from next-token prediction by the dimension of the fractal.