I specifically mean ambitious value learning using IRL. The resulting algorithm will look quite different from IRL as it currently exists. (In particular, assuming humans are reinforcement learners is problematic)
I specifically mean ambitious value learning using IRL. The resulting algorithm will look quite different from IRL as it currently exists. (In particular, assuming humans are reinforcement learners is problematic)