michaelcohen comments on IRL in General Environments

michaelcohen 11 Jul 2019 4:15 UTC
LW: 1 AF: 1
AF
IRL to get the one true utility function
I think I’m understanding you to be conceptualizing a dichotomy between “uncertainty over a utility function” vs. “looking for the one true utility function”. (I’m also getting this from your comment below:
One caveat is that I think the uncertainty over preferences/rewards is key to this story, which is a bit different from getting a single true utility function.
).
I can’t figure out on my own a sense in which this dichotomy exists. To be uncertain about a utility function is to believe there is one correct one, while engaging in the process of updating probabilities about its identity.
Also, for what it’s worth, in the case where there is an unidentifiability problem, as there is here, even in the limit, a Bayesian agent won’t converge to certainty about a utility function.
- Rohin Shah 11 Jul 2019 4:59 UTC
  LW: 2 AF: 1
  AF Parent
  I think I’m understanding you to be conceptualizing a dichotomy between “uncertainty over a utility function” vs. “looking for the one true utility function”.
  Well, I don’t personally endorse this. I was speculating on what might be relevant to Stuart’s understanding of the problem.
  I was trying to point towards the dichotomy between “acting while having uncertainty over a utility function” vs. “acting with a known, certain utility function” (see e.g. The Off-Switch Game). I do know about the problem of fully updated deference and I don’t know what Stuart thinks about it.
  Also, for what it’s worth, in the case where there is an unidentifiability problem, as there is here, even in the limit, a Bayesian agent won’t converge to certainty about a utility function.
  Agreed, but I’m not sure why that’s relevant. Why do you need certainty about the utility function, if you have certainty about the policy?
  - michaelcohen 11 Jul 2019 5:21 UTC
    LW: 3 AF: 2
    AF Parent
    Okay maybe we don’t disagree on anything. I was trying to make different point with the unidentifiability problem, but it was tangential to begin with, so never mind.