Stuart_Armstrong comments on Humans can be assigned any values whatsoever...

Stuart_Armstrong 13 Oct 2017 12:43 UTC
2 points
The model m(3) is compatible with any reward function, so any reward function R can be valid for the agent. Now, it’s true that this pair (m(3), R) can be quite complex (since m(3) is very complex), but any R is compatible. (and most m’s are also compatible—any m that maps to π(h), technically, and “almost all” m’s are surjective).