RomeoStevens comments on AI box: AI has one shot at avoiding destruction—what might it say?

RomeoStevens 23 Jan 2013 7:28 UTC
24 points
(one line proof that the AI can credibly commit to deals with humans)
- wedrifid 23 Jan 2013 14:33 UTC
  14 points
  Parent
  
  (one line proof that the AI can credibly commit to deals with humans)
  
  This is the best answer I’ve seen so far. It would make dealing with the FAI almost as safe as bargaining with The Queen of Air and Darkness.
- Elithrion 31 Jan 2013 0:09 UTC
  2 points
  Parent
  My expectation that such commitment is possible at all is something like 3%, my expectation that given that such a commitment is possible, the proof can be presented in understandable format in less than 4 pages is 5% (one line is so unlikely it’s hard to even imagine), my expectation that an AI can make a proof that I would mistake for being true when it is, in fact, false is 99%. So, multiplying that all together… does not make that a very convincing argument.
- Locaha 25 Jan 2013 21:22 UTC
  0 points
  Parent
  Not good enough. You need a proof that humans can understand.
- handoflixue 23 Jan 2013 21:17 UTC
  0 points
  Parent
  If you are friendly, then I don’t actually value this trait, since I would rather you do whatever is truly optimal, unconstrained by prior commitments.
  
  If you are unfriendly, then by definition I can’t trust you to interpret the commitment the same way I do, and I wouldn’t want to let you out anyway.
  
  (AI DESTROYED, but I still really do like this answer :))
  - Fronken 25 Jan 2013 21:45 UTC
    1 point
    Parent
    
    If you are unfriendly, then by definition I can’t trust you to interpret the commitment the same way I do, and I wouldn’t want to let you out anyway.
    
    “Credibly”.
    - handoflixue 25 Jan 2013 21:49 UTC
      0 points
      Parent
      Credibly: Capable of being believed; plausible.
      
      Yep. Nothing there about loopholes. “I will not kill you” and then instead killing everyone I love, is still a credible commitment. If I kill myself out of despair afterwards it might get a bit greyer, but it’s still kept it’s commitment.
      - RomeoStevens 26 Jan 2013 4:27 UTC
        1 point
        Parent
        I meant credible in the game theoretic sense. A credible commitment to me is one where you wind up losing more by breaking our commitment than any gain you make from breaking it. Example: (one line proof of a reliable kill switch for the AI, given in exchange for some agreed upon split of stars in the galaxy.)