paulfchristiano comments on Some work on connecting UDT and Reinforcement Learning

paulfchristiano 17 Feb 2016 23:43 UTC
0 points
AF
I see; you’re right.

You mention that $P_{M}$ could reflect the true dynamics of the environment; I read that and assumed it was a causal model mapping a (state, action) pair to the next state. But if it captures a more general state of uncertainty, then this does pick up the difference between EDT/UDT and CDT.

Note that if $P_{M}$ reflects the agent’s logical uncertainty about its own behavior, then we can’t generally expand the expectations out as an integral over possible trajectories.

For example, if I train one model to map x to E[y(x)|x], and I train another model to map (x,b) to P(y(x)=b), then the two quantities won’t generally be related by integration.

When thinking about the connection between theoretical frameworks and practical algorithms, I think this would be an interesting issue to push on.