WrongBot comments on Open Thread, August 2010

WrongBot 2 Aug 2010 18:26 UTC
4 points
Those software companies test their products for crashes and loops. There is a word for testing an AI of unknown Friendliness and that word is “suicide”.
- timtyler 2 Aug 2010 18:39 UTC
  3 points
  Parent
  That just seems to be another confusion to me :-(
  
  The argument—to the extent that I can make sense of it—is that you can’t restrain an super-intelligent machine—since it will simply use its superior brainpower to escape from the constraints.
  
  We successfully restrain intelligent agents all the time—in prisons. The prisoners may be smarter than the guards, and they often outnumber them—and yet still the restraints are usually successuful.
  
  Some of the key observations to my mind are:
  - You can often restrain one agent with many stupider agents;
  - The restraining agents do not need to be humans—they can be other machines;
  - You can often restrain one agent with a totally dumb cage;
  - Complex systems can often be tested in small pieces (unit testing);
  - Large systems can often be tested on a smaller scale before deployment;
  - Systems can often be tested in virtual environments, reducing the cost of failure.
  Discarding the standard testing-based methodology would be very silly, IMO.
  
  Indeed, it would sabotage your project to the point that it would almost inevitably be beaten—and there is very little point in aiming to lose.
  - WrongBot 2 Aug 2010 19:15 UTC
    1 point
    Parent
    Are you familiar with the AI-Box experiment? We can restrain human-intelligence level agents in prisons, most of the time. But the question to ask is: how effective was the first prison? Because that’s the equivalent case.
    
    None of the safety measures you propose are safe enough. You’re underestimating the power of a recursively self-improving AI by a factor I can’t begin to estimate—which is kind of the point.
    - Vladimir_Nesov 2 Aug 2010 19:32 UTC
      5 points
      Parent
      A much stronger argument than all-powerful AIs suddenly escaping (which is still not without merit) is that AI will have an incentive to behave as we expect it to behave, until at some point we no longer control it. It’ll try its best to pass all tests.
      What links here?
      WrongBot's comment on Open Thread, August 2010 by NancyLebovitz (3 Aug 2010 7:12 UTC; -1 points)
      - WrongBot 2 Aug 2010 19:53 UTC
        2 points
        Parent
        I suppose I was mentally classifying that kind of behavior as an escape; you’re right that it should be called out as a separate point of failure.
        Vladimir_Nesov 2 Aug 2010 20:08 UTC
        8 points
        Parent
        My point is that “ai box experiment” communicates orders of magnitude less evidence about the danger of escaping AIs than people like to imply, and there are lots of stronger and simpler self-contained arguments such as the one I gave. (The overall danger is much greater than even that, because these are specific plots with an obvious villain, while reality is more subtle.)
        WrongBot 2 Aug 2010 20:18 UTC
        2 points
        Parent
        Ahhh, I see what you’re getting at. Agreed.
        NihilCredo 3 Aug 2010 3:13 UTC
        0 points
        Parent
        For that matter, calling it an “experiment” is quite misleading.
      - timtyler 2 Aug 2010 20:08 UTC
        0 points
        Parent
        So: while it believes it is under evaluation it does its very best to behave itself?
        
        Can we wire that belief in as a prior with p=1.0?
    - timtyler 2 Aug 2010 20:04 UTC
      3 points
      Parent
      It won’t be the first prison—or anything like it.
      
      If we have powerful intelligence that needs testing, then we can have powerful guards too.
      
      The AI-Box experiment has human guards. Consequently, it has very low relevance to the actual problem. Programmers don’t build their test harnesses out of human beings.
      
      Safety is usually an economic trade off. You can usually have an lot of it—if you are prepared to pay for it.