V_V comments on AI caught by a module that counterfactually doesn’t exist

V_V 17 Nov 2014 19:09 UTC
−1 points
Isn’t that essentially a false beliefs about one’s own preferences?

I mean, the AI “true” VNM utility function, to the extent that it has one, is going to be different than the utility function the AI think reflectively it has. In principle the AI could find out the difference and this could cause it to alter its behavior.

Or maybe not, I don’t have a strong intuition about this at the moment. But if I recall correctly, in the previous work on corrigibility (I didn’t read the last version you linked yet), Soares was thinking of using causal decision nodes to implement utility indifference for the shutdown problem. This effectively introduces false beliefs into the agent, as the agent is mistaken about what causes the button to be pressed.
- So8res 17 Nov 2014 19:49 UTC
  6 points
  Parent
  My preferred interpretation of that particular method is not “the agent has false beliefs,” but instead “the agent cares both about the factual and the counterfactual worlds, and is trying to maximize utility in both at once.” That is, if you were to cry
  
  But if the humans press the button, the press signal will occur! So why are you acting such that you still get utility in the counterfactual world where humans press the button and the signal fails to occur?
  
  It will look at you funny, and say “Because I care about that counterfactual world. See? It says so right here in my utility function.” It knows the world is counterfactual, it just cares about “what would have happened” anyway. (Causal decision nodes are used to formalize “what would have happened” in the agent’s preferences, this says nothing about whether the agent uses causal reasoning when making decisions.)
  - Toggle 18 Nov 2014 0:12 UTC
    0 points
    Parent
    This greatly clarified the distinction for me. Well done.
  - V_V 17 Nov 2014 21:08 UTC
    0 points
    Parent
    
    (Causal decision nodes are used to formalize “what would have happened” in the agent’s preferences, this says nothing about whether the agent uses causal reasoning when making decisions.)
    
    Makes sense.
- Stuart_Armstrong 17 Nov 2014 20:51 UTC
  4 points
  Parent
  
  Isn’t that essentially a false beliefs about one’s own preferences?
  
  No. It’s an adjusted preference that functions in practice just like a false belief.