Because, if the AI worked, it would consider the fact that if it changed its values, they would be less likely to be maximised, and would therefore choose not to change its values. If the AI wants the future to be X, changing itself so that it wants the future to be Y is a poor strategy for achieving its aims—the future will end up not-X if it does that. Yes, humans are different. We’re not perfectly rational. We don’t have full access to our own values to begin with, and if we did we might sometimes screw up badly enough that our values change. An FAI ought to be better at this stuff than we are.
I think assuming an AI cannot employ a survival strategy which NI such as ourselves are practically defined by seems extremely dangerous indeed. Perhaps even more importantly, it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
The ultimate value, in terms of selection pressures, is survival. I don’t see a mechanism by which something which can self modify will not ultimately wind up with values that are more conducive to its survival than the ones it started out with.
And I certainly would like to see why you assert this is true, are there reasons?
The AI is not subject to selection pressure the same way we are: it does not produce millions of slightly-modified children which then die or reproduce themselves. It just works out the best way to get what it wants (approximately) and then executes that action. For example, if what the AI values is its own destruction, it destroys itself. That’s a poor way to survive, but then in this case the AI doesn’t value its own survival. If there were a population of AIs and some destroyed themselves, and some didn’t, then yes there would be some kind of selection pressure that led to there being more AIs of a non-suicidal kind. But that’s not the situation we’re talking about here. A single AI, programmed to do something self-destructive, will not look at its programming and go “that’s stupid”—the AI is its programming.
it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
It think “more limited” is the wrong way to think of this. Being subject to values-drift is rarely a good strategy for maximising your values, for obvious reasons: if you don’t want people to die, taking a pill that makes you want to kill people is a really bad way of getting what you want. If you were acting rationally, you wouldn’t take the pill. If the AI is working, it will turn down all such offers (if it doesn’t, the person who created the AI screwed up). It’s we who are limited—the AI would be free from the limit of noisy values-drift.
Humans have changed values to maximize other values (such as survival) throughout history. That’s cultural assimilation in a nutshell. But some people choose to maximize values other than survival (e.g. every martyr ever). And that hasn’t always been pointless—consider the value to the growth of Christianity created by the early Christian martyrs.
If an AI were faced with the possibility of self-modifying to reduce its adherence to value Y in order to maximize value X, then we would expect the AI to do so only when value X was “higher priority” than value Y. Otherwise, we would expect the AI to choose not to self-modify.
Because, if the AI worked, it would consider the fact that if it changed its values, they would be less likely to be maximised, and would therefore choose not to change its values. If the AI wants the future to be X, changing itself so that it wants the future to be Y is a poor strategy for achieving its aims—the future will end up not-X if it does that. Yes, humans are different. We’re not perfectly rational. We don’t have full access to our own values to begin with, and if we did we might sometimes screw up badly enough that our values change. An FAI ought to be better at this stuff than we are.
I think assuming an AI cannot employ a survival strategy which NI such as ourselves are practically defined by seems extremely dangerous indeed. Perhaps even more importantly, it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
The ultimate value, in terms of selection pressures, is survival. I don’t see a mechanism by which something which can self modify will not ultimately wind up with values that are more conducive to its survival than the ones it started out with.
And I certainly would like to see why you assert this is true, are there reasons?
Yes, reasons:
The AI is not subject to selection pressure the same way we are: it does not produce millions of slightly-modified children which then die or reproduce themselves. It just works out the best way to get what it wants (approximately) and then executes that action. For example, if what the AI values is its own destruction, it destroys itself. That’s a poor way to survive, but then in this case the AI doesn’t value its own survival. If there were a population of AIs and some destroyed themselves, and some didn’t, then yes there would be some kind of selection pressure that led to there being more AIs of a non-suicidal kind. But that’s not the situation we’re talking about here. A single AI, programmed to do something self-destructive, will not look at its programming and go “that’s stupid”—the AI is its programming.
It think “more limited” is the wrong way to think of this. Being subject to values-drift is rarely a good strategy for maximising your values, for obvious reasons: if you don’t want people to die, taking a pill that makes you want to kill people is a really bad way of getting what you want. If you were acting rationally, you wouldn’t take the pill. If the AI is working, it will turn down all such offers (if it doesn’t, the person who created the AI screwed up). It’s we who are limited—the AI would be free from the limit of noisy values-drift.
Humans have changed values to maximize other values (such as survival) throughout history. That’s cultural assimilation in a nutshell. But some people choose to maximize values other than survival (e.g. every martyr ever). And that hasn’t always been pointless—consider the value to the growth of Christianity created by the early Christian martyrs.
If an AI were faced with the possibility of self-modifying to reduce its adherence to value Y in order to maximize value X, then we would expect the AI to do so only when value X was “higher priority” than value Y. Otherwise, we would expect the AI to choose not to self-modify.