Gordon Seidoh Worley comments on Problems integrating decision theory and inverse reinforcement learning

Gordon Seidoh Worley 8 May 2018 19:31 UTC
1 point
On point (e) I know people have written before about how there are many Newcomb-like problems, but do we have any sense of just how many decision problems are enough like Newcomb that this is likely to be an issue? To me this seems whole issue seems troubling (as you suggest) unless Newcomb-like problems are not the norm, even if they feel like the norm to people worried about solving decision problems.
- Rohin Shah 10 May 2018 1:35 UTC
  3 points
  Parent
  You don’t need Newcomb-like problems to see this arise. This is a specific example of a more general problem with IRL, which is that if you have a bad model of how the human chooses actions given a utility function, then you will infer the wrong thing. You could have a bad model of the human’s decision theory, or you could not realize that humans are subject to the planning fallacy and so infer that humans must enjoy missing deadlines.
  Readings that argue this is hard:
  Impossibility of deducing preferences and rationality from human policy
  Model Mis-specification and Inverse Reinforcement Learning
  The easy goal inference problem is still hard
  Work that tries to address it:
  Learning the Preferences of Ignorant, Inconsistent Agents
  Learning the Preferences of Bounded Agents (very similar)
  My own work (coming soon)