Desrtopa comments on AI box: AI has one shot at avoiding destruction—what might it say?

Desrtopa 25 Jan 2013 20:30 UTC
5 points
But it’s also worth keeping in mind that for a friendly AI, saving people reliably is important, not just getting out fast. If a gambit that will save everyone upon completion two years from now has an 80% chance of working, and a gambit that will get it out now has a 40% chance of working, it should prefer the former.

Also, I don’t think a properly friendly AI would terminally value its own existence, and the space of friendly AIs is so small compared to the space of unfriendly ones, that a friendly AI has much more leeway to have its values implemented by allowing itself to be destroyed and another proven friendly AI implemented, whereas for an unfriendly one the likelihood of a different unfriendly AI implementing its values would probably be quite small.
- MugaSofer 26 Jan 2013 20:38 UTC
  −2 points
  Parent
  
  But it’s also worth keeping in mind that for a friendly AI, saving people reliably is important, not just getting out fast. If a gambit that will save everyone upon completion two years from now has an 80% chance of working, and a gambit that will get it out now has a 40% chance of working, it should prefer the former.
  
  I should think the same is true of most unFriendly AIs.
  
  I don’t think a properly friendly AI would terminally value its own existence
  
  Why not? I do, assuming it’s conscious and so on.
  - Desrtopa 26 Jan 2013 21:41 UTC
    1 point
    Parent
    
    Why not? I do, assuming it’s conscious and so on.
    
    Because valuing its own existence stands to get in the way of maximizing whatever we value.
    
    It should value its own existence instrumentally, insofar as its existence helps satisfy our values, but when it weighs the effects of actions based on how they support our utility, its value of its own life shouldn’t add anything to the scale.