I dunno, I think you can generalize reward farther than behavior. E.g. I might very reasonably issue high reward for winning a game of chess, or arriving at my destination safe and sound, or curing malaria, even if each involved intermediate steps that don’t make sense as ‘things I might do.’
I do agree there are limits to how much extrapolation we actually want, I just think there’s a lot of headroom for AIs to achieve ‘normal’ ends via ‘abnormal’ means.
I dunno, I think you can generalize reward farther than behavior. E.g. I might very reasonably issue high reward for winning a game of chess, or arriving at my destination safe and sound, or curing malaria, even if each involved intermediate steps that don’t make sense as ‘things I might do.’
I do agree there are limits to how much extrapolation we actually want, I just think there’s a lot of headroom for AIs to achieve ‘normal’ ends via ‘abnormal’ means.
I would be interested in what the questions of the uncertain imitator would look like in these cases.