No, the actual hidden Markov process used to generate the awesome triangle fractal image is not the {0,1,random} model but a different one, which is called “Mess3” and has a symmetry between the 3 hidden states.
Also, they’re not claiming the transformer learns merely the hidden states of the HMM, but a more complicated thing called the “mixed state presentation”, which is not the states that the HMM can be in but the (usually much larger number of) belief states which an ideal prediction process trying to “sync” to it might go thru.
No, the actual hidden Markov process used to generate the awesome triangle fractal image is not the {0,1,random} model but a different one, which is called “Mess3” and has a symmetry between the 3 hidden states.
Also, they’re not claiming the transformer learns merely the hidden states of the HMM, but a more complicated thing called the “mixed state presentation”, which is not the states that the HMM can be in but the (usually much larger number of) belief states which an ideal prediction process trying to “sync” to it might go thru.