TurnTrout comments on TurnTrout’s shortform feed

TurnTrout 11 Feb 2021 17:38 UTC
4 points
AF
In an interesting parallel to John Wentworth’s Fixing the Good Regulator Theorem, I have an MDP result that says:
Suppose we’re playing a game where I give you a reward function and you give me its optimal value function in the MDP. If you let me do this for $| S |$ reward functions (one for each state in the environment), and you’re able to provide the optimal value function for each, then you know enough to reconstruct the entire environment (up to isomorphism).
Roughly: being able to complete linearly many tasks in the state space means you have enough information to model the entire environment.