AlexMennen comments on Proper value learning through indifference

AlexMennen 20 Jun 2014 21:09 UTC
1 point
0
That’s true; it will resist changes to its “outer” utility function U. But it won’t resist changes to its “inner” utility function v, which still leaves a lot of flexibility, even though that isn’t its true utility function in the VNM sense. That restriction isn’t strong enough to avoid the problem I pointed out above.
- Stuart_Armstrong 21 Jun 2014 5:41 UTC
  3 points
  0
  Parent
  I will only allow v to change if that change will trigger the “U adaptation” (the adding and subtracting of constants). You have to specify what processes count as U adaptations (certain types of conversations with certain people, eg) and what doesn’t.
  - AlexMennen 21 Jun 2014 16:05 UTC
    1 point
    0
    Parent
    Oh, I see. So the AI simply losing the memory that v was stored in and replacing it with random noise shoudn’t count as something it will be indifferent about? How would you formalize this such that arbitrary changes to v don’t trigger the indifference?
    - Stuart_Armstrong 22 Jun 2014 20:47 UTC
      1 point
      0
      Parent
      By specifying what counts as an allowed change in U, and making the agent in to a U maximiser. Then, just as standard maximises defend their utilities, it should defend U(un clubbing the update, and only that update)
  - drnickbone 7 Jul 2014 17:23 UTC
    0 points
    0
    Parent
    I think there is a genuine problem here… the AI imposes no obstacle to “trusted programmers” changing its utility function. But apart from the human difficulties (the programmers could be corrupted by power, make mistakes etc.) what stops the AI manipulating the programmers into changing its utility function e.g. changing a hard to satisfy v into some w which is very easy to satisfy, and gives it a very high score?