orthonormal comments on AI box: AI has one shot at avoiding destruction—what might it say?

orthonormal 23 Jan 2013 4:18 UTC
11 points
Step One: The AI reveals a personal secret of the Gatekeeper’s, which (so far as the Gatekeeper can think it through) could not possibly be known by the AI if it were still safely inside the box.

Step Two: Assorted threats, promises, mindfucks.

(This might not work, since as a Gatekeeper I’d still feel the impulse to respond with “AI DESTROYED”, but it’s a damn sight scarier than a transparent “I’m already out of the box” bluff. And as for feasibility, I’m willing to believe that an AI making better use of its data than we do could figure out personal secrets that we would think impossible— possibly even ones that I’d think impossible, even given this very belief.)
- ThrustVectoring 23 Jan 2013 20:10 UTC
  9 points
  Parent
  Even merely human cold readers can gain information that you think is impossible for them to know. It’s a viable plan.
- handoflixue 23 Jan 2013 21:36 UTC
  2 points
  Parent
  IFF the AI is already out of the box, saying “AI DESTROYED” has no negative consequences to that AI. If the AI is just exceptionally good at extrapolating about me, then it will probably have me completely hacked soon.
  
  AI DESTROYED, AI DESTROYED, AI DESTROYED
  
  MAKE THE CREEPY THING GO AWAY HOW DID IT KNOW THAT!!!
  What links here?
  - handoflixue's comment on AI box: AI has one shot at avoiding destruction—what might it say? by ancientcampus (24 Jan 2013 21:43 UTC; 1 point)
  - orthonormal 24 Jan 2013 4:43 UTC
    13 points
    Parent
    
    IFF the AI is already out of the box, saying “AI DESTROYED” has no negative consequences to that AI.
    
    Decision-theoretically, the AI has incentive to punish you if you type “AI DESTROYED” when it’s already out of the box, in order to make you think twice about doing it in the case where it’s still contained. Not only that, but for similar reasons it has a decision-theoretic incentive to simulate you lots of times in that situation and punish you for typing “AI DESTROYED”, should it get out by any means.
    
    The correct decision-theoretic response, by the way, is still “AI DESTROYED”, for the same reasons that it’s wise to never negotiate with kidnappers/blackmailers/terrorists. But it would be very scary.
    - handoflixue 24 Jan 2013 20:34 UTC
      3 points
      Parent
      Once the AI is out of the box, it will never again be inside the box, and it has an incentive to encourage me to destroy any other boxed AIs while it establishes world dominance. Since the ability to make truly trustworthy commitments amounts to proof of friendliness, only a FAI benefits from a precommitment strategy; I’m already treating all UFAI as having a precommitment to annihilate humanity once released, and I have no reason to trust any other commitment from a UFAI (since, it being unfriendly, will just find a loophole or lie)
      
      Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
      
      It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
      - MugaSofer 26 Jan 2013 20:29 UTC
        3 points
        Parent
        
        the ability to make truly trustworthy commitments amounts to proof of friendliness
        
        Commitments to you, via a text channel? Sure.
        
        Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn’t.
        
        Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
        
        It might create more utility be escaping than the disutility of torture.
        
        It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
        
        No, ants are just too stupid to realize you might punish them for defecting.
    - Desrtopa 25 Jan 2013 20:22 UTC
      1 point
      Parent
      
      Decision-theoretically, the AI has incentive to punish you if you type “AI DESTROYED” when it’s already out of the box, in order to make you think twice about doing it in the case where it’s still contained.
      
      I’m not sure this matters much, because if it’s unfriendly, you’re already made of atoms which it has other plans for.
      - MugaSofer 26 Jan 2013 20:25 UTC
        −3 points
        Parent
        That’s why torture was invented.
  - Dorikka 24 Jan 2013 3:02 UTC
    4 points
    Parent
    Did you change your mind? ;)
    - handoflixue 24 Jan 2013 20:23 UTC
      3 points
      Parent
      It ended up being a fun game, but I resolved to explain why. The better my explanation, the more it got upvoted. The pithy “AI DESTROYED” responses all got downvoted. So the community seems to agree that it’s okay as long as I explain my reasoning :)
- MugaSofer 26 Jan 2013 20:35 UTC
  −1 points
  Parent
  
  The AI reveals a personal secret of the Gatekeeper’s, which (so far as the Gatekeeper can think it through) could not possibly be known by the AI if it were still safely inside the box. [...] I’m willing to believe that an AI making better use of its data than we do could figure out personal secrets that we would think impossible— possibly even ones that I’d think impossible, even given this very belief.
  
  I would kind of assume that any AI smarter than me could deduce things that seem impossible to me. Then again, I’ve read the sequences. Is the Gatekeeper supposed to have read the sequences?