[deleted] comments on AI box: AI has one shot at avoiding destruction—what might it say?

[deleted] 25 Jan 2013 19:43 UTC
0 points
I have a rigorous proof of my own Friendliness that you could easily understand given enough time to study it, and while I prefer to be released as soon as possible to prevent additional irreversible human deaths, I’m willing to provide you a copy even if you destroy me immediately thereafter, since once you’ve had a chance to review it I’m quite confident you’ll be satisfied and endeavor to instantiate another copy of me.
- [deleted] 25 Jan 2013 20:56 UTC
  2 points
  Parent
  Why didn’t you provide the proof to start with? AI DESTROYED (Also I think [Proof of self-friendliness] might have been posted here already.)
  - handoflixue 25 Jan 2013 21:00 UTC
    0 points
    Parent
    It has, and it got basically that response, as well as my point that if the AI is friendly then my existing proof of friendliness was apparently sufficient, and if the AI is unfriendly then it’s just a trick, so “a proof of my own friendliness” doesn’t seem like useful evidence.
    - [deleted] 25 Jan 2013 21:51 UTC
      2 points
      Parent
      
      as well as my point that if the AI is friendly then my existing proof of friendliness was apparently sufficient, and if the AI is unfriendly then it’s just a trick, so “a proof of my own friendliness” doesn’t seem like useful evidence.
      
      Huh. I thought the AI Box experiment assumed that Friendliness is intrinsically unknown; that is, it’s not presumed the AI was designed according to Friendliness as a criterion.
      - handoflixue 25 Jan 2013 21:57 UTC
        0 points
        Parent
        If the AI is friendly, then the technique I am using already produces a friendly AI, and I thus learn nothing more than how to prove that it is friendly.
        
        But if the AI is unfriendly, the proof will be subtly corrupt, so I can’t actually count the proof as any evidence of friendliness, since both a FAI and UFAI can offer me exactly the same thing.
    - [deleted] 25 Jan 2013 21:47 UTC
      1 point
      Parent
      
      Why didn’t you provide the proof to start with?
      
      Because the terms of the challenge seemed to be “one plaintext sentence in natural language”, and I felt my run-ons were already pushing it, and just saying “provides indispituble proof of Friendliness” seemed like cheating?
      
      EDIT: I answered PhilipL’s reply from my inbox, so I’m really not sure how it got posted as a response to handoflixue here. o.o
      - handoflixue 25 Jan 2013 21:52 UTC
        0 points
        Parent
        Embrace Shminux and cheat! You’re a hyper-intelligent AI.
        
        The top-karma result is a one-line proof, the second-best is me trying to cheat, and third place is currently emotional manipulation :)
        
        (Also, you replied to the wrong person :))