You mention that PM could reflect the true dynamics of the environment; I read that and assumed it was a causal model mapping a (state, action) pair to the next state. But if it captures a more general state of uncertainty, then this does pick up the difference between EDT/UDT and CDT.
Note that if PM reflects the agent’s logical uncertainty about its own behavior, then we can’t generally expand the expectations out as an integral over possible trajectories.
For example, if I train one model to map x to E[y(x)|x], and I train another model to map (x,b) to P(y(x)=b), then the two quantities won’t generally be related by integration.
When thinking about the connection between theoretical frameworks and practical algorithms, I think this would be an interesting issue to push on.
I see; you’re right.
You mention that PM could reflect the true dynamics of the environment; I read that and assumed it was a causal model mapping a (state, action) pair to the next state. But if it captures a more general state of uncertainty, then this does pick up the difference between EDT/UDT and CDT.
Note that if PM reflects the agent’s logical uncertainty about its own behavior, then we can’t generally expand the expectations out as an integral over possible trajectories.
For example, if I train one model to map x to E[y(x)|x], and I train another model to map (x,b) to P(y(x)=b), then the two quantities won’t generally be related by integration.
When thinking about the connection between theoretical frameworks and practical algorithms, I think this would be an interesting issue to push on.