I note that even experts sometimes sloppily talk as if RL agents make plans towards the goal of maximizing future reward—see for example Pitfalls of Learning a Reward Function Online.
Fwiw, I think most analysis of this form starts from the assumption “the agent is maximizing future reward” and then reasoning out from there. I agree with you that such analysis probably doesn’t apply to RL agents directly (since RL agents do not necessarily make plans towards the goal of maximizing future reward), but it can apply to e.g. planning agents that are specifically designed that way.
(Idk what the people who make such analyses actually have in mind for what sorts of agents we’ll actually build; I wish they would be clearer on this.)
I still don’t think it’s a good idea to control a robot with a literal remote-control reward button; I just don’t think that the robot will necessarily want to grab that remote from us. It might or might not want to. It’s a complicated and interesting question.
+1, and I think the considerations are pretty similar to those in model-free RL.
Fwiw, I think most analysis of this form starts from the assumption “the agent is maximizing future reward” and then reasoning out from there. I agree with you that such analysis probably doesn’t apply to RL agents directly (since RL agents do not necessarily make plans towards the goal of maximizing future reward), but it can apply to e.g. planning agents that are specifically designed that way.
(Idk what the people who make such analyses actually have in mind for what sorts of agents we’ll actually build; I wish they would be clearer on this.)
+1, and I think the considerations are pretty similar to those in model-free RL.