Kaj_Sotala comments on The Preference Fulfillment Hypothesis

Kaj_Sotala Feb 26, 2023, 5:43 PM
3 points
0
and the AI is unlikely to have that.
Is there some particular reason to assume that it’d be hard to implement?
- kibber Feb 26, 2023, 6:33 PM
  3 points
  2
  Parent
  To clarify, I meant the AI is unlikely to have it by default (being able to perfectly simulate a person does not in itself require having empathy as part of the reward function).
  If we try to hardcode it, Goodhart’s curse seems relevant: https://arbital.com/p/goodharts_curse/
  - Gunnar_Zarncke Feb 26, 2023, 10:28 PM
    0 points
    −2
    Parent
    But note that Reward is not the optimization target
- green_leaf Feb 27, 2023, 3:55 AM
  1 point
  0
  Parent
  I’m thinking that even if it didn’t break when going out of distribution, it would still not be a good idea to try to train AI to do things that will make us feel good, because what if it decided it wanted to hook us up to morphine pumps?