Suppose we’re playing a game where I give you a reward function and you give me its optimal value function in the MDP. If you let me do this for |S| reward functions (one for each state in the environment), and you’re able to provide the optimal value function for each, then you know enough to reconstruct the entire environment (up to isomorphism).
Roughly: being able to complete linearly many tasks in the state space means you have enough information to model the entire environment.
In an interesting parallel to John Wentworth’s Fixing the Good Regulator Theorem, I have an MDP result that says:
Roughly: being able to complete linearly many tasks in the state space means you have enough information to model the entire environment.