There is a straightforward compmech take also.
If the goal of the agent is simply to predict well (let’s say the reward is directly tied to good prediction) for a sequential task AND it performs optimally then we know it must contain the Mixed State Presentation of the epsilon machine (causal states).
Importantly the MSP must be used if optimal prediction is achieved.
There is a variant I think, that has not been worked out yet but we talked about briefly with Fernando and Vanessa in Manchester recently for transducers /MDPs
There is a straightforward compmech take also. If the goal of the agent is simply to predict well (let’s say the reward is directly tied to good prediction) for a sequential task AND it performs optimally then we know it must contain the Mixed State Presentation of the epsilon machine (causal states). Importantly the MSP must be used if optimal prediction is achieved.
There is a variant I think, that has not been worked out yet but we talked about briefly with Fernando and Vanessa in Manchester recently for transducers /MDPs