JQuinton comments on Stupid Questions Open Thread Round 4

JQuinton 31 Aug 2012 23:20 UTC
3 points
Don’t know if this has been answered, or where to even look for it, but here goes.

Once FAI is achieved and we are into the Singularity, how would we stop this superintelligence from rewriting its “friendly” code to something else and becoming unfriendly?
- Alicorn 31 Aug 2012 23:46 UTC
  12 points
  Parent
  We wouldn’t. However, the FAI knows that if it changed its code to unFriendly code, then unFriendly things would happen. It’s Friendly, so it doesn’t want unFriendly things to happen, so it doesn’t want to change its code in such a way as to cause those things—so a proper FAI is stably Friendly. Unfortunately, this works both ways: an AI that wants something else will want to keep wanting it, and will resist attempts to change what it wants.
  
  There’s more on this in Omohundro’s paper “Basic AI Drives”; relevant keyword is “goal distortion”. You can also check out various uses of the classic example of giving Gandhi a pill that would, if taken, make him want to murder people. (Hint: he does not take it, ’cause he doesn’t want people to get murdered.)