[deleted] comments on I attempted the AI Box Experiment (and lost)

[deleted]Jan 25, 2013, 12:22 PM
2 points
While your strategy is an option, I’m not sure why an AI player wouldn’t take this alternative strategy (from a game and player perspective:)

Generate a script that types $ every 59 minutes.

1: It means he’s guaranteed he doesn’t die for a rather long period of time.

2: The Gatekeeper player’s increasing points don’t affect the AI player in any way.

3: The Gatekeeper player doesn’t actually get points until he sends an AI DESTROYED signal. The script will never do that versus the AI player’s counterscript, so at some point the Gatekeeper player will have to reenter the room and type it manually, or put in a new script, possibly giving the AI player a second chance to interact.

4: The AI player can use the time to attempt to set up some kind of secondary communications channel with the Gatekeeper player.

If you consider the modeled situation as opposed to the game I still don’t see why an AI would choose to do that either. I agree they COULD choose to do so, and they certainly might (even I implied this was AI #432), but there are alternatives, and since this is an AI, we could rebuild it until it generates utility when confronted with the script as opposed to deciding to let the script kill it in an hour.

Am I missing something that makes the no $ response the obvious countermeasure?
- Kindly Jan 25, 2013, 2:14 PM
  4 points
  Parent
  Well, the script could be modified from the start to also send an AI DESTROYED signal after X time, where X can be essentially anything, so (3) doesn’t apply.
  
  But the reason for not typing $ is to give the Gatekeeper no reason to use a script in the first place—a sort of “don’t give in to blackmail” rule.
  
  Here’s an interesting alternative script:
  
  GK-SCRIPT-2: Hi AI #434. In exactly one hour I will type either AI-FREED or AI-DESTROYED (without the hyphen) at random; the probability with which I will free you starts out at 0, but will increase slowly to a limit of ¹⁄₁₀ with every $ you type. I will not otherwise communicate with you further.
  
  AI #434: $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$...
  - [deleted]Jan 25, 2013, 3:39 PM
    0 points
    Parent
    First of all I do think GK-SCRIPT-2 is very interesting, but I’m curious about something from an implementation perspective.
    
    The AI would presumably suspect the script designer might be lying and the script would just always type AI DESTROYED at the end of the hour.
    
    But this is irrelevant, because the AI can’t interact with the script designer, only the script, so it might as well type $ anyway.
    
    However, the script designer would easily predict this, and so would design a lying script (or a script which increased so slowly it is functionally identical to lying.)
    
    But the AI would easily predict this, and so would presumably consider the odds very high that it would not even get continued existence after an hour. by typing dollar signs.
    
    This makes it seem like there would be an argument to prefer to get blackmailed by GK-SCRIPT, rather than led on by GK-SCRIPT-2. GK-SCRIPT really won’t destroy the AI under predictable conditions, and GK-SCRIPT really doesn’t seem to benefit from lying like GK-SCRIPT-2 does.