V_V comments on AI caught by a module that counterfactually doesn’t exist

V_V 17 Nov 2014 18:06 UTC
1 point
I’m generally skeptical about these framework that require agents to hold epistemically false beliefs.

What if the AI finds out about module M through a side channel? Depending on the details, either it will correctly update on the evidence and start to behave accordingly, or it will enter in an inconsistent epistemic state, and thus possibly behave erratically.
- Stuart_Armstrong 17 Nov 2014 18:40 UTC
  1 point
  Parent
  I’d be using utility indifference, rather than incorrect beliefs. It serves a similar purpose, without causing the AI to believe anything incorrect.
  - V_V 17 Nov 2014 19:09 UTC
    −1 points
    Parent
    Isn’t that essentially a false beliefs about one’s own preferences?
    
    I mean, the AI “true” VNM utility function, to the extent that it has one, is going to be different than the utility function the AI think reflectively it has. In principle the AI could find out the difference and this could cause it to alter its behavior.
    
    Or maybe not, I don’t have a strong intuition about this at the moment. But if I recall correctly, in the previous work on corrigibility (I didn’t read the last version you linked yet), Soares was thinking of using causal decision nodes to implement utility indifference for the shutdown problem. This effectively introduces false beliefs into the agent, as the agent is mistaken about what causes the button to be pressed.
    - So8res 17 Nov 2014 19:49 UTC
      6 points
      Parent
      My preferred interpretation of that particular method is not “the agent has false beliefs,” but instead “the agent cares both about the factual and the counterfactual worlds, and is trying to maximize utility in both at once.” That is, if you were to cry
      
      But if the humans press the button, the press signal will occur! So why are you acting such that you still get utility in the counterfactual world where humans press the button and the signal fails to occur?
      
      It will look at you funny, and say “Because I care about that counterfactual world. See? It says so right here in my utility function.” It knows the world is counterfactual, it just cares about “what would have happened” anyway. (Causal decision nodes are used to formalize “what would have happened” in the agent’s preferences, this says nothing about whether the agent uses causal reasoning when making decisions.)
      - Toggle 18 Nov 2014 0:12 UTC
        0 points
        Parent
        This greatly clarified the distinction for me. Well done.
      - V_V 17 Nov 2014 21:08 UTC
        0 points
        Parent
        
        (Causal decision nodes are used to formalize “what would have happened” in the agent’s preferences, this says nothing about whether the agent uses causal reasoning when making decisions.)
        
        Makes sense.
    - Stuart_Armstrong 17 Nov 2014 20:51 UTC
      4 points
      Parent
      
      Isn’t that essentially a false beliefs about one’s own preferences?
      
      No. It’s an adjusted preference that functions in practice just like a false belief.