The authors acknowledged that the modifications they did to RL “brings RL closer to the frameworks of decision theory and game theory” (AFAICT, the algorithms they end up with are nearly pure decision/game theory) but given that some researchers have been focused on decision theory for a long time exactly because a decision theoretic agent can be reflectively stable, it seems incongruous to also write “perhaps surprisingly, there are modifications of the RL objective that remove the agent’s incentive to tamper with the reward function.”
Ah, that makes sense. I kind of guessed that the target audience is RL researchers, but still misinterpreted “perhaps surprisingly” as a claim of novelty instead of an attempt to raise the interest of the target audience.
Could you spell out what makes the quotes incongruous with each other? It’s not jumping out at me.
The authors acknowledged that the modifications they did to RL “brings RL closer to the frameworks of decision theory and game theory” (AFAICT, the algorithms they end up with are nearly pure decision/game theory) but given that some researchers have been focused on decision theory for a long time exactly because a decision theoretic agent can be reflectively stable, it seems incongruous to also write “perhaps surprisingly, there are modifications of the RL objective that remove the agent’s incentive to tamper with the reward function.”
We didn’t expect this to be surprising to the LessWrong community. Many RL researchers tend to be surprised, however.
Ah, that makes sense. I kind of guessed that the target audience is RL researchers, but still misinterpreted “perhaps surprisingly” as a claim of novelty instead of an attempt to raise the interest of the target audience.