[deleted] comments on I played as a Gatekeeper and came pretty close to losing in a couple of occasions. Logs and a brief recap inside.

[deleted] 9 Feb 2015 12:00 UTC
0 points
My strategy was that there would always be a default position in which I could switch if the opponent’s argument started to get too convincing, and for me that was the “there’s a 100% chance that all AIs are dangerous” position.
- MathiasZaman 9 Feb 2015 13:41 UTC
  2 points
  Parent
  Does that provide an advantage over just precommitting to answer any good argument with: “Yeah, that’s a good point, but I still won’t let you out.”
- V_V 9 Feb 2015 14:07 UTC
  1 point
  Parent
  
  “there’s a 100% chance that all AIs are dangerous”
  
  It seems to me that the default position of the Gatekeeper should be “I don’t give a shit about AIs, I’m just playing to win.”
  - Transfuturist 9 Feb 2015 17:49 UTC
    2 points
    Parent
    That really seems against the spirit of the experiment. If you categorically refuse to let the AI out, then you’re contravening the entire purpose that the AI was created for. It might as well be destroyed. The implicit cost in refusing to determine whether the AI is Friendly is enormous.
    - V_V 10 Feb 2015 18:40 UTC
      2 points
      Parent
      So what? You are not talking to a real AI, and the “experiment” is a poor model for a real AI safety assessment scenario.
      
      Keep in mind that the rules states that the “AI” player gets to determine all the context of the fictional setting and the results of all tests. It’s basically the “Game Master” in RPG terminology.
      Can you beat a sufficiently smart and motivated GM who is determined to screw you player character? Seems pretty hard (“Rocks fall, Everyone Dies”).
      
      But in this game the “AI” player needs the specific approval of the “Gatekeeper” player in order to win, and the rules allow for the “Gatekeeper” player to step out of character or play an irrational character, which is exactly what you have to do to infallibly counter any machination the “AI” player can devise.
      - Transfuturist 11 Feb 2015 2:16 UTC
        5 points
        Parent
        If categorical refusal is the only way to guarantee a gatekeeper’s win, then there’s no point in running the experiment. I’m not interested in seeing the obvious results of categorical refusal, I want to see the kind of reasoning, arguments, appeals, memes, manipulations, and deals (that mere humans can come up with) that would allow a boxed AI to escape. There’s no point to the entire thing if you are emulating a rock on the floor.
        Xerographica 11 Feb 2015 2:49 UTC
        0 points
        Parent
        I agree… but honestly I’m not very familiar with the entire concept. If an equivalently intelligent alien from another planet visited us would we also want to stick it in a box? What if it was a super smart human from the future? Box him too? Why stop there? Maybe we should have boxed Einstein and it’s not too late to box Hawking and Tao.
        
        For some reason I’m a little stuck on the part where we reverse the idea that individuals are innocent until proven otherwise. Justice for me but not for thee?
        
        It wouldn’t seem very rational to argue that every exceptionally intelligent individual should be incarcerated until they can prove their innocent intentions to less intelligent individuals. What’s the basis? Does more intelligence mean less morality?
        
        When trying to figure out where to draw the line… the entire thought exercise of boxing up a sentient being by virtue of its exceptional intelligence… makes me feel a bit like a member of a lynch mob.
        Transfuturist 11 Feb 2015 9:43 UTC
        4 points
        Parent
        If Stephen Hawking were capable and willing of turning the visible universe into copies of himself, I would want to keep him boxed too. At a certain level of risk it is no longer a matter of justice, but a matter of survival of the human species, and likely all other species, sapient or otherwise.
        
        EDIT: To make it clearer, I also think it is “Just” to box a sentient entity to prevent a measure of disutility to an as-of-yet undetermined utility function approximating CEV.