moridinamael comments on Corrigibility thoughts I: caring about multiple things

moridinamael 19 Jan 2017 3:46 UTC
3 points
“Just care what I want” is a separate, unsolved research problem. Corrigibility is an attempt to get an agent to simply not immediately kill its user even if it doesn’t necessarily have a good model of what that user wants.
- Dagon 19 Jan 2017 6:27 UTC
  0 points
  Parent
  “don’t kill an operator” seems like something that can more easily be encoded into an agent than “allow operators to correct things they consider undesirable when they notice them”.
  
  In fact, even a perfectly corrigible agent with such a glaring initial flaw might kill the operator(s) before they can apply the corrections, not because they are resisting correction, but just because it furthers whatever other goals they may have.
  - moridinamael 19 Jan 2017 14:44 UTC
    0 points
    Parent
    You’re exactly right, I think. IMO it may actually be easier to build an AI that can learn to want what some target agent wants, than to build an AI that lets itself be interfered with by some operator whose goals don’t align with its own current goals.