Is there a way to bind the optimization process to actual patterns in the environment? To design a framework in which the screen informs the agent about the patterns it should optimize for? The answer is, yes, we can just define a utility function that assigns a value to every possible future history and use it to replace the reward system in the agent specification [...]
If only the problem was that easy. Telling an agent to optimise a utility function over external world states—rather than a reward function—gets into the issue of how you tell a machine the difference between real and apparent utility—when all they have to go on is sensory data.
It isn’t easy to get this right when you have a superintelligent agent working to drive a wedge between your best efforts, and the best possible efforts.
If only the problem was that easy. Telling an agent to optimise a utility function over external world states—rather than a reward function—gets into the issue of how you tell a machine the difference between real and apparent utility—when all they have to go on is sensory data.
It isn’t easy to get this right when you have a superintelligent agent working to drive a wedge between your best efforts, and the best possible efforts.