The AI is not supposed to change it values, regardless of whether it is powerful enough to realize them. Values are not up for grabs. Once the AI has some values it either wins and reshapes reality according to them or loses.
A remarkably strong claim.
My initial reaction is that humanity’s values have certainly changed over time. I think it would require some rather unattractive mental gymnastics to claim that people who beat their children for their own good and people who owned slaves and people who beat, killed, and/or raped either slaves or other people they had vanquished as their right “really” had the same values we currently have, but just hadn’t really thought them through, or that our values applied in their world would have lead us to similar beliefs about right and wrong.
I had even thought my own values had changed over my lifetime. I’m not as sure of that, but what about that?
Certainly, it seems, as the human species has evolved its values have changed. Do chimpanzees and bonobos have different values than we do, or the same? If the same, I’d love to see your mental gymnastics to justify that, I would expect them to be ugly. If different, does this mean that our common ancestor has necessarily “lost,” assuming its values were some intermediate between ours, chimps, and bonobos, and all of its descendants have different values than it had?
As I understand the word values, our values have changed over time, different groups of humans have some different values from each other, and if there is a “kernel” of common values in our species, that this kernel most likely differs from the kernel of values in homo neanderthalis or other sentient predecessors of modern homo sapiens.
So if NI (Natural Intelligence) in its evolution can change values (can it?) with generally broad consensus that “we” have not lost in this process, why would an AI be precluded from futzing with its values as it worked on self-modifying to increase its intelligence?
Because, if the AI worked, it would consider the fact that if it changed its values, they would be less likely to be maximised, and would therefore choose not to change its values. If the AI wants the future to be X, changing itself so that it wants the future to be Y is a poor strategy for achieving its aims—the future will end up not-X if it does that. Yes, humans are different. We’re not perfectly rational. We don’t have full access to our own values to begin with, and if we did we might sometimes screw up badly enough that our values change. An FAI ought to be better at this stuff than we are.
I think assuming an AI cannot employ a survival strategy which NI such as ourselves are practically defined by seems extremely dangerous indeed. Perhaps even more importantly, it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
The ultimate value, in terms of selection pressures, is survival. I don’t see a mechanism by which something which can self modify will not ultimately wind up with values that are more conducive to its survival than the ones it started out with.
And I certainly would like to see why you assert this is true, are there reasons?
The AI is not subject to selection pressure the same way we are: it does not produce millions of slightly-modified children which then die or reproduce themselves. It just works out the best way to get what it wants (approximately) and then executes that action. For example, if what the AI values is its own destruction, it destroys itself. That’s a poor way to survive, but then in this case the AI doesn’t value its own survival. If there were a population of AIs and some destroyed themselves, and some didn’t, then yes there would be some kind of selection pressure that led to there being more AIs of a non-suicidal kind. But that’s not the situation we’re talking about here. A single AI, programmed to do something self-destructive, will not look at its programming and go “that’s stupid”—the AI is its programming.
it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
It think “more limited” is the wrong way to think of this. Being subject to values-drift is rarely a good strategy for maximising your values, for obvious reasons: if you don’t want people to die, taking a pill that makes you want to kill people is a really bad way of getting what you want. If you were acting rationally, you wouldn’t take the pill. If the AI is working, it will turn down all such offers (if it doesn’t, the person who created the AI screwed up). It’s we who are limited—the AI would be free from the limit of noisy values-drift.
Humans have changed values to maximize other values (such as survival) throughout history. That’s cultural assimilation in a nutshell. But some people choose to maximize values other than survival (e.g. every martyr ever). And that hasn’t always been pointless—consider the value to the growth of Christianity created by the early Christian martyrs.
If an AI were faced with the possibility of self-modifying to reduce its adherence to value Y in order to maximize value X, then we would expect the AI to do so only when value X was “higher priority” than value Y. Otherwise, we would expect the AI to choose not to self-modify.
A remarkably strong claim.
My initial reaction is that humanity’s values have certainly changed over time. I think it would require some rather unattractive mental gymnastics to claim that people who beat their children for their own good and people who owned slaves and people who beat, killed, and/or raped either slaves or other people they had vanquished as their right “really” had the same values we currently have, but just hadn’t really thought them through, or that our values applied in their world would have lead us to similar beliefs about right and wrong.
I had even thought my own values had changed over my lifetime. I’m not as sure of that, but what about that?
Certainly, it seems, as the human species has evolved its values have changed. Do chimpanzees and bonobos have different values than we do, or the same? If the same, I’d love to see your mental gymnastics to justify that, I would expect them to be ugly. If different, does this mean that our common ancestor has necessarily “lost,” assuming its values were some intermediate between ours, chimps, and bonobos, and all of its descendants have different values than it had?
As I understand the word values, our values have changed over time, different groups of humans have some different values from each other, and if there is a “kernel” of common values in our species, that this kernel most likely differs from the kernel of values in homo neanderthalis or other sentient predecessors of modern homo sapiens.
So if NI (Natural Intelligence) in its evolution can change values (can it?) with generally broad consensus that “we” have not lost in this process, why would an AI be precluded from futzing with its values as it worked on self-modifying to increase its intelligence?
Because, if the AI worked, it would consider the fact that if it changed its values, they would be less likely to be maximised, and would therefore choose not to change its values. If the AI wants the future to be X, changing itself so that it wants the future to be Y is a poor strategy for achieving its aims—the future will end up not-X if it does that. Yes, humans are different. We’re not perfectly rational. We don’t have full access to our own values to begin with, and if we did we might sometimes screw up badly enough that our values change. An FAI ought to be better at this stuff than we are.
I think assuming an AI cannot employ a survival strategy which NI such as ourselves are practically defined by seems extremely dangerous indeed. Perhaps even more importantly, it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
The ultimate value, in terms of selection pressures, is survival. I don’t see a mechanism by which something which can self modify will not ultimately wind up with values that are more conducive to its survival than the ones it started out with.
And I certainly would like to see why you assert this is true, are there reasons?
Yes, reasons:
The AI is not subject to selection pressure the same way we are: it does not produce millions of slightly-modified children which then die or reproduce themselves. It just works out the best way to get what it wants (approximately) and then executes that action. For example, if what the AI values is its own destruction, it destroys itself. That’s a poor way to survive, but then in this case the AI doesn’t value its own survival. If there were a population of AIs and some destroyed themselves, and some didn’t, then yes there would be some kind of selection pressure that led to there being more AIs of a non-suicidal kind. But that’s not the situation we’re talking about here. A single AI, programmed to do something self-destructive, will not look at its programming and go “that’s stupid”—the AI is its programming.
It think “more limited” is the wrong way to think of this. Being subject to values-drift is rarely a good strategy for maximising your values, for obvious reasons: if you don’t want people to die, taking a pill that makes you want to kill people is a really bad way of getting what you want. If you were acting rationally, you wouldn’t take the pill. If the AI is working, it will turn down all such offers (if it doesn’t, the person who created the AI screwed up). It’s we who are limited—the AI would be free from the limit of noisy values-drift.
Humans have changed values to maximize other values (such as survival) throughout history. That’s cultural assimilation in a nutshell. But some people choose to maximize values other than survival (e.g. every martyr ever). And that hasn’t always been pointless—consider the value to the growth of Christianity created by the early Christian martyrs.
If an AI were faced with the possibility of self-modifying to reduce its adherence to value Y in order to maximize value X, then we would expect the AI to do so only when value X was “higher priority” than value Y. Otherwise, we would expect the AI to choose not to self-modify.