XiXiDu comments on Siren worlds and the perils of over-optimised search

XiXiDu 13 May 2014 9:28 UTC
−3 points

You don’t happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.

But you might come across a program motivated to eliminate all humans if you designed it to optimise the economy...

This line of reasoning still seems flawed to me. It’s just like saying that you can build an airplane that can fly and land, autonomously, except that your plane is going to forcefully crash into a nuclear power plant.

The gist of the matter is that there are a vast number of ways that you can fail at predicting your programs behavior. Most of these failure modes are detrimental to the overall optimization power of the program. This is because being able to predict the behavior of your AI, to the extent necessary for it to outsmart humans, is analogous to predicting that your airplane will fly without crashing. Eliminating humans, in order to optimize the economy, is about as likely as your autonomous airplane crashing into a nuclear power plant, in order to land safely.
- nshepperd 13 May 2014 11:58 UTC
  6 points
  Parent
  I don’t know why you think you can predict the likely outcome of an artificial general intelligence by making surface analogies to things that aren’t even optimization processes. People have been using analogies to “predict” nonsense for centuries.
  
  In this case there are a variety of reasons that a programmer might succeed at preventing a UAV from crashing into a nuclear power plant, yet fail at preventing AGI from eliminating all humans. Mainly revolving around the fact that most programmers wouldn’t even consider the “eliminate all humans” option as a serious possibility until it had already occurred, while the problem of physical obstructions is explicitly a part of the UAV problem definition. That itself has to do with the fact that an AGI can represent internally features of the world that weren’t even considered by the designers (due to general intelligence).
  
  As an aside, serious misconfigurations or unintended results of computer programs happen all the time today, but you don’t generally hear or care about them because they don’t end the world.
  - XiXiDu 13 May 2014 15:06 UTC
    −6 points
    Parent
    The analogy was highlighting what all intelligently designed things have in common. Namely that they don’t magically work perfectly well at doing something they were not designed to do. If you are bad enough at programming that when trying to encode the optimization of human happiness your system interprets this as maximizing smiley faces, then you won’t end up with an optimization process that is powerful enough to outsmart humans. Because it will be similarly bad at optimizing other things that are necessary in order to do so.
    
    As an aside, serious misconfigurations or unintended results of computer programs happen all the time today, but you don’t generally hear or care about them because they don’t end the world.
    
    And such failures will cause your AI to blow itself up after a few cycles of self-improvement. Humans will need to become much better at not making such mistakes, good enough that they can encode this ability to work as intended.
    - nshepperd 14 May 2014 0:00 UTC
      2 points
      Parent
      
      The analogy was highlighting what all intelligently designed things have in common. Namely that they don’t magically work perfectly well at doing something they were not designed to do.
      
      a) I’m glad to see you have a magical oracle that tells you true facts about “all intelligently designed things”. Maybe it can tell us how to build a friendly AI.
      
      b) You’re conflating “designed” in the sense of “hey, I should build an AI that maximises human happiness” with “designed” in the sense of what someone actually programs into the utility function or generally goal structure of the AI. It’s very easy to make huge blunders betwen A and B.
      
      If you are bad enough at programming that when trying to encode the optimization of human happiness your system interprets this as maximizing smiley faces, then you won’t end up with an optimization process that is powerful enough to outsmart humans.
      
      c) You haven’t shown this, just assumed it based on your surface analogies.
      
      d) Even if you had, people will keep trying until one of their programs succeeds at taking over the world, then it’s game over. (Or, if we’re lucky, it succeeds at causing some major destruction, then fails somehow, teaching us all a lesson about AI safety.)
      
      e) Being a bad programmer isn’t even a difficulty if the relevant algorithms have already been worked out by researchers and you can just copy and paste your optimization code from the internet.
      
      f) http://lesswrong.com/lw/jao/siren_worlds_and_the_perils_of_overoptimised/awpe