Rohin Shah comments on The easy goal inference problem is still hard

Rohin Shah 6 Nov 2018 22:46 UTC
2 points
Just to make sure I understand: You’re arguing that even if we somehow solve the easy goal inference problem, there will still be some aspect of values we don’t capture?
- cousin_it 7 Nov 2018 7:59 UTC
  8 points
  1
  Parent
  Yeah. I think a creature behaving just like me doesn’t necessarily have the exact same internal experiences. Across all possible creatures, there are degrees of freedom in internal experiences that aren’t captured by actions. Some of these might be value-relevant.
  - Rohin Shah 8 Nov 2018 18:57 UTC
    5 points
    0
    Parent
    Yeah, in ML language, you’re describing the unidentifiability problem in inverse reinforcement learning—for any behavior, there are typically many reward functions for which that behavior is optimal.
    Though another way this could be true is if “internal experience” depends on what algorithm you use to generate your behavior, and “optimize a learned reward” doesn’t meet the bar. (For example, I don’t think a giant lookup table that emulates my behavior is having the same experience that I am.)