Importantly, this only works for narrow value learning, not what Paul calls “ambitious value learning” (learning long-term preferences). Narrow value learning has much more in common with imitation learning than ambitious value learning; at best, you end up with something that pursues similar subgoals to the ones humans do.
The concern in the original post applies to ambitious value learning. (But ambitious value learning using IRL already looks pretty doomed anyway).
I wrote that post you link to, and I don’t think ambitious value learning is doomed at all—just that we can’t do it the way we traditionally attempt to.
I specifically mean ambitious value learning using IRL. The resulting algorithm will look quite different from IRL as it currently exists. (In particular, assuming humans are reinforcement learners is problematic)
Importantly, this only works for narrow value learning, not what Paul calls “ambitious value learning” (learning long-term preferences). Narrow value learning has much more in common with imitation learning than ambitious value learning; at best, you end up with something that pursues similar subgoals to the ones humans do.
The concern in the original post applies to ambitious value learning. (But ambitious value learning using IRL already looks pretty doomed anyway).
I wrote that post you link to, and I don’t think ambitious value learning is doomed at all—just that we can’t do it the way we traditionally attempt to.
I specifically mean ambitious value learning using IRL. The resulting algorithm will look quite different from IRL as it currently exists. (In particular, assuming humans are reinforcement learners is problematic)