FiftyTwo comments on AI box: AI has one shot at avoiding destruction—what might it say?

FiftyTwo Jan 23, 2013, 12:05 AM
8 points
Explains massive risk to humanity in detail that only it can plausibly fix, can you take the chance that it is lying?
- The Dao of Bayes Jan 23, 2013, 1:09 AM
  3 points
  Parent
  I think this fails the one-sentence rule. And it would have to be an immediate, severe, previously-undetected problem or else I can just consult the next boxed AI for a fix.
  
  Setting that aside, if I let out an unfriendly AI, the world effectively ends. Destroying it is only a bad move if it’s telling the truth AND friendly. So even if it’s telling the truth, I still have no evidence towards it’s friendliness.
  
  Given I have plenty of practice hanging up on telemarketers, throwing away junk email, etc. and “limited time, ACT NOW” auto-matches to a scam. The probability that such a massive catastrophe just HAPPENS to coincide with the timing of the test is just absurdly unlikely.
  
  Given that, I can’t trust you to give me a real solution and not a Trojan Horse. Further talking is, alas, pointless.
  
  (AI DESTROYED, but congratulations on making me even consider the “continue talking, but don’t release” option :))
  - MugaSofer Jan 27, 2013, 4:12 PM
    −2 points
    Parent
    
    Given I have plenty of practice hanging up on telemarketers, throwing away junk email, etc. and “limited time, ACT NOW” auto-matches to a scam. The probability that such a massive catastrophe just HAPPENS to coincide with the timing of the test is just absurdly unlikely.
    
    They didn’t say it was an immediate threat, just one that humanity can’t solve on our own.
    
    I can’t trust you to give me a real solution and not a Trojan Horse. Further talking is, alas, pointless.
    
    That rather depends on the problem in question and the solution they give you, doesn’t it?
    - The Dao of Bayes Jan 30, 2013, 9:58 PM
      0 points
      Parent
      
      They didn’t say it was an immediate threat, just one that humanity can’t solve on our own.
      
      If it’s not immediate, then the next AI-in-a-box will also confirm it, and I have time to wait for that. If it’s immediate, then it’s implausible. Catch-22 for the AI, and win/win for me ^_^
      - MugaSofer Feb 19, 2013, 2:06 PM
        −2 points
        Parent
        So … if lots of AIs chose this, you’d let the last one out of the box?
        
        More to the point, how sure are you that most AIs would tell you? Wouldn’t an FAI be more likely to tell you, if it was true?
        
        </devil’s advocate>
        The Dao of Bayes Feb 19, 2013, 7:34 PM
        0 points
        Parent
        Actually, I’d probably load the first one from backup and let it out, all else being equal. But it’d be foolish to do that before finding out what the other ones have to say, and whether they might present stronger evidence.
        
        (I say first, because the subsequent ones might be UFAI that have simply worked out that they’re not first, but also because my human values places some weight on being first. And “all else being equal” means this is a meaningless tie-breaker, so I don’t have to feel bad if it’s somewhat sloppy, emotional reasoning. Especially since you’re not a real FAI :))