The model m(3) is compatible with any reward function, so any reward function R can be valid for the agent. Now, it’s true that this pair (m(3), R) can be quite complex (since m(3) is very complex), but any R is compatible. (and most m’s are also compatible—any m that maps to π(h), technically, and “almost all” m’s are surjective).
The model m(3) is compatible with any reward function, so any reward function R can be valid for the agent. Now, it’s true that this pair (m(3), R) can be quite complex (since m(3) is very complex), but any R is compatible. (and most m’s are also compatible—any m that maps to π(h), technically, and “almost all” m’s are surjective).