Don’t know if this has been answered, or where to even look for it, but here goes.
Once FAI is achieved and we are into the Singularity, how would we stop this superintelligence from rewriting its “friendly” code to something else and becoming unfriendly?
We wouldn’t. However, the FAI knows that if it changed its code to unFriendly code, then unFriendly things would happen. It’s Friendly, so it doesn’t want unFriendly things to happen, so it doesn’t want to change its code in such a way as to cause those things—so a proper FAI is stably Friendly. Unfortunately, this works both ways: an AI that wants something else will want to keep wanting it, and will resist attempts to change what it wants.
There’s more on this in Omohundro’s paper “Basic AI Drives”; relevant keyword is “goal distortion”. You can also check out various uses of the classic example of giving Gandhi a pill that would, if taken, make him want to murder people. (Hint: he does not take it, ’cause he doesn’t want people to get murdered.)
Don’t know if this has been answered, or where to even look for it, but here goes.
Once FAI is achieved and we are into the Singularity, how would we stop this superintelligence from rewriting its “friendly” code to something else and becoming unfriendly?
We wouldn’t. However, the FAI knows that if it changed its code to unFriendly code, then unFriendly things would happen. It’s Friendly, so it doesn’t want unFriendly things to happen, so it doesn’t want to change its code in such a way as to cause those things—so a proper FAI is stably Friendly. Unfortunately, this works both ways: an AI that wants something else will want to keep wanting it, and will resist attempts to change what it wants.
There’s more on this in Omohundro’s paper “Basic AI Drives”; relevant keyword is “goal distortion”. You can also check out various uses of the classic example of giving Gandhi a pill that would, if taken, make him want to murder people. (Hint: he does not take it, ’cause he doesn’t want people to get murdered.)