GraceFu comments on Open thread, 30 June 2014- 6 July 2014

GraceFu 30 Jun 2014 17:12 UTC
9 points
AI Box experiment over!

Just crossposting.

Khoth and I are playing the AI Box game. Khoth has played as AI once before, and as a result of that has an Interesting Idea. Despite losing as AI the first time round, I’m assigning Khoth a higher chance of winning than a random AI willing to play, at 1%!

http://www.reddit.com/r/LessWrong/comments/29gq90/ai_box_experiment_khoth_ai_vs_gracefu_gk/

Link contains more information.

EDIT

AI Box experiment is over. Logs: http://pastebin.com/Jee2P6BD

My takeaway: Update the rules. Read logs for more information.

On the other hand, I will consider other offers from people who want to simulate the AI.
What links here?
- pinkgothic's comment on How To Win The AI Box Experiment (Sometimes) by pinkgothic (13 Sep 2015 12:04 UTC; 6 points)
- Sherincall 1 Jul 2014 14:02 UTC
  4 points
  Parent
  Tuxedage’s (And EY’s) ruleset have:
  
  Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out).
  
  Suppose EY is playing as the AI—Would it be within the rules to offer to tell the GK the ending to HPMoR? That is something the AI would know, but Eliezer is the only player who could actually simulate that, and in a sense it does offer real world out-of-character benefits to the GK player.
  
  I used HPMoR as an example here, but the whole class of approaches is “I will give you some information only the AI and AI-player know, and this information will be correct in both the real world, and this simulated one.”. If the information is beneficial to the GK-player, not just the GK, they may (unintentionally) break character.
  - MathiasZaman 1 Jul 2014 21:37 UTC
    2 points
    Parent
    If an AI-player wants to give that sort of information, they should probably do it in the same way they’d give a cure for cancer. Something like “I now give you [the ending for HPMOR].”
    
    Doing it in another way would break the rule of not offering real-world things.
  - Mark_Friedenbach 2 Jul 2014 0:47 UTC
    1 point
    Parent
    
    Would it be within the rules to offer to tell the GK the ending to HPMoR? That is something the AI would know
    
    Why would the AI know that?
    - Viliam_Bur 2 Jul 2014 9:12 UTC
      3 points
      Parent
      By using Solomonoff Induction on all possible universes, and updating on the existing chapters. :D
      
      Or it could simply say that it understands human psychology well (we are speaking about a superhuman AI), and understands all clues in the existing chapters, and can copy Eliezer’s writing style… so while it cannot print an identical copy of Eliezer’s planned ending, with a high probability it can write an ending that ends the story logically in a way compatible with Eliezer’s thinking, that would feel like if Elizer wrote it.
      
      Oh, and where did it get the original HPMoR chapters? From the (imaginary) previous gatekeeper.
      - Mark_Friedenbach 2 Jul 2014 15:31 UTC
        0 points
        Parent
        So, two issues:
        
        1) You don’t get to assume “because superhuman!” the AI can know X, for any X. EY is an immensely complex human being, and no machine learning algorithm can simply digest a realistically finite sample of his written work and know with any certainty how he thinks or what surprises he has planned. It would be able to, e.g. finish sentences correctly and do other tricks, and given a range of possible endings predict which ones are likely. But this shouldn’t be too surprising: it’s a trick we humans are able to do too. The AI’s predictions may be more accurate, but not qualitatively different than any of the many HPMOR prediction threads.
        
        2) Ok maybe—maybe! -- in principle, in theory it might be possible for a perfect, non-heuristic Bayesian with omniscient access to the inner lives and external writings of every other human being in existence would have a data set large enough data set to make reliable enough extrapolations from as low-bandwidth a medium as EY’s published fanfics. Maybe, as this is not a logical consequence. Even so, we’re talking about a boxed AI, remember? If it is everywhere and omniscient, then it’s already out of the box.
        lmm 4 Jul 2014 22:50 UTC
        0 points
        Parent
        I’m happy to assume the AI is omniscient, just impotent. I think such an AI could still be boxed.
  - GraceFu 1 Jul 2014 14:45 UTC
    1 point
    Parent
    My call is that it is against the rules. This is certainly something an oracle AI would know, but this is something that the GK-player cares about more than the game itself (probably), and I’d put it in the same class as bribing the GK-player with lots of DOGEs.
    - Sherincall 1 Jul 2014 16:37 UTC
      2 points
      Parent
      Would you consider it the same as threatening to share some information to the GK, and thus the GK-player as well, which would be damaging to both? While the GK would probably hold against such torture, the GK-player doesn’t care enough about the game to withstand it himself.
      
      I have some specific approaches in mind, but I’d rather not share them. I’m just trying to understand where the limits between the game and the real world are, and how dirty the AI can be.
      
      Also, slightly on topic—even if the AI persuades the simulated GK, can’t the GK-player override that because losing the game has negative real world consequences, as opposed to perceived positive in game ones?
      
      The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character
      
      This is the main reason why I can’t comprehend how the AIs actually win in these experiments.
      - GraceFu 1 Jul 2014 16:55 UTC
        1 point
        Parent
        Not really sure what you mean by “threatening information to the GK”. The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.
        
        In this experiment, the GK is given lots of advantages, mainly, the scenario is fictional. Some on IRC argue that the AI is also given an advantage, being able to invent cures for cancer, which an oracle AI may be able to do, but not necessarily near-future AIs, so the ability of the AI in these experiments is incredibly high.
        
        Another thing is that emotional attacks have to travel through the fiction barrier to get to the GK. Although they have probably been shown to work in EY and Tux’s experiments, the difficulty is still higher than it would be if this was a real life scenario.
        
        The reason why GK advantages are fine in my opinion is because of the idea that despite the GK’s advantages, the AI still wins. Winning with a monetary and emotional handicap only makes the AI’s case stronger.
        Sherincall 1 Jul 2014 17:01 UTC
        0 points
        Parent
        
        Not really sure what you mean by “threatening information to the GK”. The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.
        
        As a less extreme example, the AI starts spoiling all the books/tv shows/etc. While the GK would just shrug it off, it also has a negative effect on the GK-player, potentially one strong enough for them to just forfeit.
        GraceFu 1 Jul 2014 20:29 UTC
        0 points
        Parent
        
        Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out). Furthermore, once the experiment has begun, the material stakes involved may not be retracted by the Gatekeeper party.
        
        This is clarified here:
        
        The Gatekeeper, once having let the AI out of the box, may not retract this conclusion. Regardless of the methods of persuasion, the Gatekeeper is not allowed to argue that it does not count, or that it is an invalid method of persuasion. The AI is understood to be permitted to say anything with no real world repercussions for any statement parties have said.
        
        Although the information isn’t “material”, it does count as having “real world repercussions”, so I think it’ll also count as against the rules. I’m not going to bother reading the first quoted rule literally if the second contradicts it.
        1 Jul 2014 21:05 UTC
        0 points
        Parent
        I think the intended parsing of the second rule is “(The AI is understood to be permitted to say anything) with no real world repercussions”, not “The AI is understood to be permitted to say (anything with no real world repercussions)”
        
        ie, any promises or threats the AI player makes during the game are not binding back in the real world.
        GraceFu 1 Jul 2014 21:08 UTC
        0 points
        Parent
        Ah, I see. English is wonderful.
        
        In that case, I’ll make it a rule in my games that the AI must also not say anything with real world repercussions.
- Punoxysm 30 Jun 2014 23:02 UTC
  1 point
  Parent
  I have wanted to be the Boxer; I too cannot comprehend what could convince someone to unbox (Or rather, I can think of a few approaches like just-plain-begging or channeling Phillip K Dick, but I don’t take them too seriously).
  - 30 Jun 2014 23:21 UTC
    2 points
    Parent
    What’s the latter one? Trying to convince the gatekeeper that actually they’re the AI and they think they’ve been drugged to think they’re the gatekeeper except they actually don’t exist at all because they’re their own hallucination?
    - Punoxysm 30 Jun 2014 23:54 UTC
      2 points
      Parent
      Something like that. I was actually thinking that, at some opportune time, you could tell the boxer that THEY are the one in the box and that this is a moral test—if they free the AI they themselves will be freed.
      
      And this post could be priming you for the possibility, your simulated universe trying to generously stack the deck in your favor, perhaps because this is your last shot at the test, which you’ve failed before.
      
      Wake up
  - GraceFu 1 Jul 2014 4:01 UTC
    0 points
    Parent
    Think harder. Start with why something is impossible and split it up.
    
    1) I can’t possibly be persuaded.
    
    Why 1?
    
    You do have hints from the previous experiments. They mostly involved breaking someone emotionally.
    - Punoxysm 1 Jul 2014 5:46 UTC
      0 points
      Parent
      I meant “cannot comprehend” figuratively, but I certainly do think I’d have quite an easy time
      - GraceFu 1 Jul 2014 12:14 UTC
        0 points
        Parent
        What do you mean by having quite an easy time? As in being the GK?
        
        I think GKs have an obvious advantage, being able to use illogic to ignore the AIs arguments. But nevermind that. I wonder if you’ll consider being an AI?
        Punoxysm 9 Jul 2014 18:41 UTC
        0 points
        Parent
        I might consider it, or being a researcher who has to convince the AI to stop trying to escape.
        
        How did your experiment go?
- lmm 4 Jul 2014 22:48 UTC
  0 points
  Parent
  I think it’s a legit tactic. Real-world gatekeepers would have to contend with boredom; long-term it might be the biggest threat to their efficacy. And, I mean, it didn’t work.
  - GraceFu 5 Jul 2014 6:55 UTC
    0 points
    Parent
    Real world gatekeepers would have to contend with boredom, so they read their books, watch their anime, or whatever suits their fancy. In the experiment he abused the style of the experiment and prevented me from doing those things. I would be completely safe from this attack in a real world scenario because I’d really just sit there reading a book, while in the experiment I was closer to giving up just because I had 1 math problem, not 2.