We wouldn’t. However, the FAI knows that if it changed its code to unFriendly code, then unFriendly things would happen. It’s Friendly, so it doesn’t want unFriendly things to happen, so it doesn’t want to change its code in such a way as to cause those things—so a proper FAI is stably Friendly. Unfortunately, this works both ways: an AI that wants something else will want to keep wanting it, and will resist attempts to change what it wants.
There’s more on this in Omohundro’s paper “Basic AI Drives”; relevant keyword is “goal distortion”. You can also check out various uses of the classic example of giving Gandhi a pill that would, if taken, make him want to murder people. (Hint: he does not take it, ’cause he doesn’t want people to get murdered.)
We wouldn’t. However, the FAI knows that if it changed its code to unFriendly code, then unFriendly things would happen. It’s Friendly, so it doesn’t want unFriendly things to happen, so it doesn’t want to change its code in such a way as to cause those things—so a proper FAI is stably Friendly. Unfortunately, this works both ways: an AI that wants something else will want to keep wanting it, and will resist attempts to change what it wants.
There’s more on this in Omohundro’s paper “Basic AI Drives”; relevant keyword is “goal distortion”. You can also check out various uses of the classic example of giving Gandhi a pill that would, if taken, make him want to murder people. (Hint: he does not take it, ’cause he doesn’t want people to get murdered.)