Stuart_Armstrong comments on Research Agenda v0.9: Synthesising a human’s preferences into a utility function

Stuart_Armstrong 21 Nov 2021 14:54 UTC
2 points
0
AF
Because our preferences are inconsistent, and if an AI says “your true preferences are $U_{H}$ ”, we’re likely to react by saying “no! No machine will tell me what my preferences are. My true preferences are $U_{H}^{'}$ , which are different in subtle ways”.
- Evan R. Murphy 21 Nov 2021 21:47 UTC
  LW: 1 AF: 1
  0
  AF Parent
  So the subtle manipulation is to compensate for those rebellious impulses making $U_{H}$ unstable?
  Why not just let the human have those moments and alter their $U_{H}$ if that’s what they think they want? Over time, then they may learn that being capricious with their AI doesn’t ultimately serve them very well. But if they find out the AI is trying to manipulate them, that could make them want to rebel even more and have less trust for the AI.
- [ ]
  [deleted]