The AI is not subject to selection pressure the same way we are: it does not produce millions of slightly-modified children which then die or reproduce themselves. It just works out the best way to get what it wants (approximately) and then executes that action. For example, if what the AI values is its own destruction, it destroys itself. That’s a poor way to survive, but then in this case the AI doesn’t value its own survival. If there were a population of AIs and some destroyed themselves, and some didn’t, then yes there would be some kind of selection pressure that led to there being more AIs of a non-suicidal kind. But that’s not the situation we’re talking about here. A single AI, programmed to do something self-destructive, will not look at its programming and go “that’s stupid”—the AI is its programming.
it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
It think “more limited” is the wrong way to think of this. Being subject to values-drift is rarely a good strategy for maximising your values, for obvious reasons: if you don’t want people to die, taking a pill that makes you want to kill people is a really bad way of getting what you want. If you were acting rationally, you wouldn’t take the pill. If the AI is working, it will turn down all such offers (if it doesn’t, the person who created the AI screwed up). It’s we who are limited—the AI would be free from the limit of noisy values-drift.
Yes, reasons:
The AI is not subject to selection pressure the same way we are: it does not produce millions of slightly-modified children which then die or reproduce themselves. It just works out the best way to get what it wants (approximately) and then executes that action. For example, if what the AI values is its own destruction, it destroys itself. That’s a poor way to survive, but then in this case the AI doesn’t value its own survival. If there were a population of AIs and some destroyed themselves, and some didn’t, then yes there would be some kind of selection pressure that led to there being more AIs of a non-suicidal kind. But that’s not the situation we’re talking about here. A single AI, programmed to do something self-destructive, will not look at its programming and go “that’s stupid”—the AI is its programming.
It think “more limited” is the wrong way to think of this. Being subject to values-drift is rarely a good strategy for maximising your values, for obvious reasons: if you don’t want people to die, taking a pill that makes you want to kill people is a really bad way of getting what you want. If you were acting rationally, you wouldn’t take the pill. If the AI is working, it will turn down all such offers (if it doesn’t, the person who created the AI screwed up). It’s we who are limited—the AI would be free from the limit of noisy values-drift.