If you’re in a situation where you can reasonably extrapolate from past rewards to future reward, you can probably extrapolate previously seen “normal behaviour” to normal behaviour in your situation. Reinforcement learning is limited—you can’t always extrapolate past reward—but it’s not obvious that imitative regularisation is fundamentally more limited.
I dunno, I think you can generalize reward farther than behavior. E.g. I might very reasonably issue high reward for winning a game of chess, or arriving at my destination safe and sound, or curing malaria, even if each involved intermediate steps that don’t make sense as ‘things I might do.’
I do agree there are limits to how much extrapolation we actually want, I just think there’s a lot of headroom for AIs to achieve ‘normal’ ends via ‘abnormal’ means.
If you’re in a situation where you can reasonably extrapolate from past rewards to future reward, you can probably extrapolate previously seen “normal behaviour” to normal behaviour in your situation. Reinforcement learning is limited—you can’t always extrapolate past reward—but it’s not obvious that imitative regularisation is fundamentally more limited.
(normal does not imply safe, of course)
I dunno, I think you can generalize reward farther than behavior. E.g. I might very reasonably issue high reward for winning a game of chess, or arriving at my destination safe and sound, or curing malaria, even if each involved intermediate steps that don’t make sense as ‘things I might do.’
I do agree there are limits to how much extrapolation we actually want, I just think there’s a lot of headroom for AIs to achieve ‘normal’ ends via ‘abnormal’ means.
I would be interested in what the questions of the uncertain imitator would look like in these cases.