So observation based rewards lead to bad behavior when the rewarded observation maps to different states (with at least one of those states being undesired)?
And a fully observable environment doesn’t have that problem because you always know which state you’re in? If so, wouldn’t you still be rewarded by observations and incentivized to show yourself blue images forever?
Also, a fully-observable environment will still choose to wirehead if that’s a possibility, correct?
Let me try and reframe. The point of this post isn’t that we’re rewarding bad things, it’s that there might not exist a reward function whose optimal policy does good things! This has to do with the structure of agent-environment interaction, and how precisely we can incentivize certain kinds of optimal action. If the reward functions linear functionals over camera RGB values, then excepting the trivial zero function, plugging in any one of these reward functions to AIXI leads to doom! We just can’t specify a reward function from this class which doesn’t (this is different from there maybe existing a “human utility function” which is simply hard to specify).
Thanks!
So observation based rewards lead to bad behavior when the rewarded observation maps to different states (with at least one of those states being undesired)?
And a fully observable environment doesn’t have that problem because you always know which state you’re in? If so, wouldn’t you still be rewarded by observations and incentivized to show yourself blue images forever?
Also, a fully-observable environment will still choose to wirehead if that’s a possibility, correct?
Let me try and reframe. The point of this post isn’t that we’re rewarding bad things, it’s that there might not exist a reward function whose optimal policy does good things! This has to do with the structure of agent-environment interaction, and how precisely we can incentivize certain kinds of optimal action. If the reward functions linear functionals over camera RGB values, then excepting the trivial zero function, plugging in any one of these reward functions to AIXI leads to doom! We just can’t specify a reward function from this class which doesn’t (this is different from there maybe existing a “human utility function” which is simply hard to specify).