On point (e) I know people have written before about how there are many Newcomb-like problems, but do we have any sense of just how many decision problems are enough like Newcomb that this is likely to be an issue? To me this seems whole issue seems troubling (as you suggest) unless Newcomb-like problems are not the norm, even if they feel like the norm to people worried about solving decision problems.
You don’t need Newcomb-like problems to see this arise. This is a specific example of a more general problem with IRL, which is that if you have a bad model of how the human chooses actions given a utility function, then you will infer the wrong thing. You could have a bad model of the human’s decision theory, or you could not realize that humans are subject to the planning fallacy and so infer that humans must enjoy missing deadlines.
On point (e) I know people have written before about how there are many Newcomb-like problems, but do we have any sense of just how many decision problems are enough like Newcomb that this is likely to be an issue? To me this seems whole issue seems troubling (as you suggest) unless Newcomb-like problems are not the norm, even if they feel like the norm to people worried about solving decision problems.
You don’t need Newcomb-like problems to see this arise. This is a specific example of a more general problem with IRL, which is that if you have a bad model of how the human chooses actions given a utility function, then you will infer the wrong thing. You could have a bad model of the human’s decision theory, or you could not realize that humans are subject to the planning fallacy and so infer that humans must enjoy missing deadlines.
Readings that argue this is hard:
Impossibility of deducing preferences and rationality from human policy
Model Mis-specification and Inverse Reinforcement Learning
The easy goal inference problem is still hard
Work that tries to address it:
Learning the Preferences of Ignorant, Inconsistent Agents
Learning the Preferences of Bounded Agents (very similar)
My own work (coming soon)