You are absolutely right and I am of course absolutely and embarrasingly wrong.
The minimal optimal predictor as a Hidden Markov Model of the simple nonunfilar is indeed infinite. This implies that any other architecture must be capable of expressing infinitely many states—but this is quite a weak statement—it’s very easy for a machine to dynamically express finitely many states with finite memory. In particular, a transformer should absolutely be able to learn the MSP of the epsilon machine of the simple nonunifilar source—indeed it can even be solved analytically.
This was an embarrasing mistake I should not have made. I regret my rash overconfidence—I should have taken a moment to think it through since the statement was obviously wrong. Thank you for pointing it out.
You are absolutely right and I am of course absolutely and embarrasingly wrong.
The minimal optimal predictor as a Hidden Markov Model of the simple nonunfilar is indeed infinite. This implies that any other architecture must be capable of expressing infinitely many states—but this is quite a weak statement—it’s very easy for a machine to dynamically express finitely many states with finite memory. In particular, a transformer should absolutely be able to learn the MSP of the epsilon machine of the simple nonunifilar source—indeed it can even be solved analytically.
This was an embarrasing mistake I should not have made. I regret my rash overconfidence—I should have taken a moment to think it through since the statement was obviously wrong. Thank you for pointing it out.