Presumably, the problems of friendly or unfriendly AI are just like the problems of friendly or unfriendly NI (Natural Intelligence). Intelligence seems more an agency, a tool, and friendliness or unfriendliness a largely orthogonal consideration. In the case of humans, I would imagine our values are largely dictated by “what worked.” That is, societies and even subspecies with different values would undergo natural selection pressures proportional to how effective the values were at adding to survival and thrivance of the group possessing them.
Suppose, as this group generally does, that self-modifying AI will have the ability to modify itself by design, and that one of its values it designs towards is higher intelligence. Is such an evolution constrained by evolution-like pressures or is it not?
The argument that it is not is that it is changing so fast, and so far ahead of any concievable competition, that from the point of view of the evolution of its values, it is running “open loop.” THat is, the first AI to go FOOM is so far superior in ability to anything else in the world that its subsequent steps of evolution are unconstrained by any outside pressures, and only follow either some sort of internal logic of value-change as intelligence increases, or else follow no logic at all, go in some sense on a “random walk” through possible values. That is, with the quickly increasing intelligence, the values of the FOOMing AI are nearly irrelevant to its overall effectiveness, and therefore totally irrelevant to determining whether it will survive and thrive going up against humans. Its intelligence is sufficient to guarantee its survival, its values get a free ride.
But is this right? Does a FOOMing AI really look like a single intelligence ramping up its own ability? This is certainly NOT the way evolution has gone about improving the intelligence of our species. Evolution tries many small modifications and then does natural experiments to see which ones do better and which do worse. By attrition it keeps the ones that did better and uses these as a base for further experiments.
My own sense of how I create using my intelligence is that I try many different things. Many are tried purely in the sandbox of my own brain, run as simulations there, and only the more promising kept for further testing and development. It seems to me that my pool of ideas is an almost random noise of “what ifs” and that my creative intelligence is the discrimination function filtering which of these ideas are given more resources and which are killed in the crib.
So intelligent creation seems to me to be very much like evolution, with competition.
Might we expect an AI to do something like this? To essentially hypothesize various modifications to itself, and then to test the more promising ones by running them as simulations, with increasing exactitude of the sims as the various ideas are winnowed down to the best ones?
Might an AI determine that the most efficient way to do this is to actually have many competing versions of itself constantly running, essentially, against each other? Might the FOOMing of an AI look a lot like the FOOMing of NI, which is what is going on on our planet right now?
I really don’t know what the implications of this point of view are for FAI. I don’t know whether this point of view is even at odds in any real way with SIAI’s biggest worries.
I do wonder whether humanity is meant to survive when, in some sense, whatever comes next arrives. In one picture, the dinosaurs did not survive their design of mammals. (They designed mammals by putting a lot of selection pressure on mammals). In another picture, the dinosaurs did survive their design of mammals, but they survived by “slightly modifying” themselves into birds and lizards and stuff.
Th next step is electronic-based intelligence which is kick started on its evolution by us, just as we were kickstarted by plants (there are NO animals until you have plants), and plants were kickstarted by simpler life that exploited less abundant but more available energy in chemical mixes. Or the next step might be something that arrives through some natural path we are not considering carefully, either aliens invading, or a strong psi arising among the whales so that their intelligence grows enoguh to overcome their lack of digits.
WHatever the next step, if its presence has the human race survive and thrive by doing the equivalent of what turned dinosaurs in to birds, or turned wolves into domesticated dogs, does that count as Friendly or Unfriendly?
And is there really any point at all to fighting against it?
That is, the first AI to go FOOM is so far superior in ability to anything else in the world that its subsequent steps of evolution are unconstrained by any outside pressures, and only follow either some sort of internal logic of value-change as intelligence increases, or else follow no logic at all, go in some sense on a “random walk” through possible values.
The AI is not supposed to change it values, regardless of whether it is powerful enough to realize them. Values are not up for grabs. Once the AI has some values it either wins and reshapes reality according to them or loses. Changing the values is one form losing. It seems that mostly anything that counts as a value system would object to changing an agent subscribing to that system into an agent using something else, so the AI won’t follow any internal logic of value-change (unless some other agent forces it) and if it changes its values it will be by mistake (so closer to a random walk). Part of the idea of FAI is to build an AI that won’t make those mistakes.
My own sense of how I create using my intelligence is that I try many different things. Many are tried purely in the sandbox of my own brain, run as simulations there, and only the more promising kept for further testing and development. It seems to me that my pool of ideas is an almost random noise of “what ifs” and that my creative intelligence is the discrimination function filtering which of these ideas are given more resources and which are killed in the crib.
The ideas coming into your awareness are very strongly pre-filtered; creativity is far from random noise. For one, the ideas are all relevant and somehow extrapolated from your knowledge of the world. Some of them might seem stupid but its only because of the pre-selection—they never get compared to the idea of ‘blue mesmerizingly up the slightly irreverent ladder, then dwarf the pegasus with the quantum sprocket’ (and even this still makes a lot of sense compared to most random messages).
WHatever the next step, if its presence has the human race survive and thrive by doing the equivalent of what turned dinosaurs in to birds, or turned wolves into domesticated dogs, does that count as Friendly or Unfriendly?
It counts as failure to preserve humanity. An AI that does that is probably unfriendly (barring the coercion by external powerful agents. Eliezer actually wrote a story about such scenario, without AIs though.)
And is there really any point at all to fighting against it?
The ideas coming into your awareness are very strongly pre-filtered; creativity is far from random noise.
I agree but I don’t think that changes my conclusions. In teaching humans to be more creative, they are taught to pay more attention for a longer time to at least some of the outlier ideas. Indeed, a lot of times I think the difference between the intellectually curious and creative people I like to interact with and the rest is that the rest have predecided a lot of things, turned their thresholds for “unreal” ideas coming in to consciousness up higher than I have turned mine. Maybe they are right more often than I am, but the real measure of why they do this is that their ancestors who outsurvived a lot of other people trying a lot of other things did that same level of filtering, and it resulted in winjning more wars, having more children that survived, killing more competitors, or some combination of these and other results that constitute selection pressures.
An AI in the process of FOOMing, which necessarily has the capacity to consider a lot more ideas in a lot more detail than we do, what makes you think that AI will constrain itself by the values it used to have? Unless you think we have the same values as the first self-replicating molecules that began life on earth, the FOOMing of Natural Intelligence (which has taken billions of years) has been accompanied by value changes.
The AI is not supposed to change it values, regardless of whether it is powerful enough to realize them. Values are not up for grabs. Once the AI has some values it either wins and reshapes reality according to them or loses.
A remarkably strong claim.
My initial reaction is that humanity’s values have certainly changed over time. I think it would require some rather unattractive mental gymnastics to claim that people who beat their children for their own good and people who owned slaves and people who beat, killed, and/or raped either slaves or other people they had vanquished as their right “really” had the same values we currently have, but just hadn’t really thought them through, or that our values applied in their world would have lead us to similar beliefs about right and wrong.
I had even thought my own values had changed over my lifetime. I’m not as sure of that, but what about that?
Certainly, it seems, as the human species has evolved its values have changed. Do chimpanzees and bonobos have different values than we do, or the same? If the same, I’d love to see your mental gymnastics to justify that, I would expect them to be ugly. If different, does this mean that our common ancestor has necessarily “lost,” assuming its values were some intermediate between ours, chimps, and bonobos, and all of its descendants have different values than it had?
As I understand the word values, our values have changed over time, different groups of humans have some different values from each other, and if there is a “kernel” of common values in our species, that this kernel most likely differs from the kernel of values in homo neanderthalis or other sentient predecessors of modern homo sapiens.
So if NI (Natural Intelligence) in its evolution can change values (can it?) with generally broad consensus that “we” have not lost in this process, why would an AI be precluded from futzing with its values as it worked on self-modifying to increase its intelligence?
Because, if the AI worked, it would consider the fact that if it changed its values, they would be less likely to be maximised, and would therefore choose not to change its values. If the AI wants the future to be X, changing itself so that it wants the future to be Y is a poor strategy for achieving its aims—the future will end up not-X if it does that. Yes, humans are different. We’re not perfectly rational. We don’t have full access to our own values to begin with, and if we did we might sometimes screw up badly enough that our values change. An FAI ought to be better at this stuff than we are.
I think assuming an AI cannot employ a survival strategy which NI such as ourselves are practically defined by seems extremely dangerous indeed. Perhaps even more importantly, it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
The ultimate value, in terms of selection pressures, is survival. I don’t see a mechanism by which something which can self modify will not ultimately wind up with values that are more conducive to its survival than the ones it started out with.
And I certainly would like to see why you assert this is true, are there reasons?
The AI is not subject to selection pressure the same way we are: it does not produce millions of slightly-modified children which then die or reproduce themselves. It just works out the best way to get what it wants (approximately) and then executes that action. For example, if what the AI values is its own destruction, it destroys itself. That’s a poor way to survive, but then in this case the AI doesn’t value its own survival. If there were a population of AIs and some destroyed themselves, and some didn’t, then yes there would be some kind of selection pressure that led to there being more AIs of a non-suicidal kind. But that’s not the situation we’re talking about here. A single AI, programmed to do something self-destructive, will not look at its programming and go “that’s stupid”—the AI is its programming.
it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
It think “more limited” is the wrong way to think of this. Being subject to values-drift is rarely a good strategy for maximising your values, for obvious reasons: if you don’t want people to die, taking a pill that makes you want to kill people is a really bad way of getting what you want. If you were acting rationally, you wouldn’t take the pill. If the AI is working, it will turn down all such offers (if it doesn’t, the person who created the AI screwed up). It’s we who are limited—the AI would be free from the limit of noisy values-drift.
Humans have changed values to maximize other values (such as survival) throughout history. That’s cultural assimilation in a nutshell. But some people choose to maximize values other than survival (e.g. every martyr ever). And that hasn’t always been pointless—consider the value to the growth of Christianity created by the early Christian martyrs.
If an AI were faced with the possibility of self-modifying to reduce its adherence to value Y in order to maximize value X, then we would expect the AI to do so only when value X was “higher priority” than value Y. Otherwise, we would expect the AI to choose not to self-modify.
It counts as failure to preserve humanity. An AI that does that is probably unfriendly (barring the coercion by external powerful agents. Eliezer actually wrote a story about such scenario, without AIs though.)
Interesting. I think I may even agree with you. In that story each race would need to conclude that the other races are “unfriendly”. So Eliezer has written a story in which all the NATURAL intelligences (except us of course) are “unfriendly,” and in which a human would need to agree that from the point of view of the other intelligent races, human intelligence was “unfriendly.”
Perhaps all intelligences are necessarily “unfriendly” to all other intelligences. This could even apply at the micro level, perhaps each human intelligence is “unfriendly” to all other human intelligences. This actually looks pretty real and pretty much like what happens in a world where survival is the only enforced value. Humans have the fascinating conundrum that even though we are unfriendly to the other humans, we have a much better chance of surviving and thriving by working with the other humans. The alliances and technical abilities and so on are, if not balance across all humans and all groups, at least balanced enough across many of them so that the result is a plethora of competing / cooperating intelligences where the jury is still out on who is the ultimate winner. Breeding in to us the ability (the value?) that “others” are our allies against “the enemies” clearly has resulted in collective efforts of cooperation that have produced quickly cascading production ability in our species. “We” worried about the Nazis FOOMing and winning, we worried the Soviets might FOOM and win. Our ancestors fought against every tribe that lived 5 miles away from them, before cultural evolution allowed them (us) to cooperate in groups of hundreds of millions.
So in Eliezer’s story, 3 NI’s have FOOMed and then finally run into each other. And they CANNOT resist getting up in each other’s grills. And why not? what are the chances that the final intelligence IF only one is left will have been one which was shy about destroying potential competitors before they destroyed it?
Presumably, the problems of friendly or unfriendly AI are just like the problems of friendly or unfriendly NI (Natural Intelligence). Intelligence seems more an agency, a tool, and friendliness or unfriendliness a largely orthogonal consideration. In the case of humans, I would imagine our values are largely dictated by “what worked.” That is, societies and even subspecies with different values would undergo natural selection pressures proportional to how effective the values were at adding to survival and thrivance of the group possessing them.
Suppose, as this group generally does, that self-modifying AI will have the ability to modify itself by design, and that one of its values it designs towards is higher intelligence. Is such an evolution constrained by evolution-like pressures or is it not?
The argument that it is not is that it is changing so fast, and so far ahead of any concievable competition, that from the point of view of the evolution of its values, it is running “open loop.” THat is, the first AI to go FOOM is so far superior in ability to anything else in the world that its subsequent steps of evolution are unconstrained by any outside pressures, and only follow either some sort of internal logic of value-change as intelligence increases, or else follow no logic at all, go in some sense on a “random walk” through possible values. That is, with the quickly increasing intelligence, the values of the FOOMing AI are nearly irrelevant to its overall effectiveness, and therefore totally irrelevant to determining whether it will survive and thrive going up against humans. Its intelligence is sufficient to guarantee its survival, its values get a free ride.
But is this right? Does a FOOMing AI really look like a single intelligence ramping up its own ability? This is certainly NOT the way evolution has gone about improving the intelligence of our species. Evolution tries many small modifications and then does natural experiments to see which ones do better and which do worse. By attrition it keeps the ones that did better and uses these as a base for further experiments.
My own sense of how I create using my intelligence is that I try many different things. Many are tried purely in the sandbox of my own brain, run as simulations there, and only the more promising kept for further testing and development. It seems to me that my pool of ideas is an almost random noise of “what ifs” and that my creative intelligence is the discrimination function filtering which of these ideas are given more resources and which are killed in the crib.
So intelligent creation seems to me to be very much like evolution, with competition.
Might we expect an AI to do something like this? To essentially hypothesize various modifications to itself, and then to test the more promising ones by running them as simulations, with increasing exactitude of the sims as the various ideas are winnowed down to the best ones?
Might an AI determine that the most efficient way to do this is to actually have many competing versions of itself constantly running, essentially, against each other? Might the FOOMing of an AI look a lot like the FOOMing of NI, which is what is going on on our planet right now?
I really don’t know what the implications of this point of view are for FAI. I don’t know whether this point of view is even at odds in any real way with SIAI’s biggest worries.
I do wonder whether humanity is meant to survive when, in some sense, whatever comes next arrives. In one picture, the dinosaurs did not survive their design of mammals. (They designed mammals by putting a lot of selection pressure on mammals). In another picture, the dinosaurs did survive their design of mammals, but they survived by “slightly modifying” themselves into birds and lizards and stuff.
Th next step is electronic-based intelligence which is kick started on its evolution by us, just as we were kickstarted by plants (there are NO animals until you have plants), and plants were kickstarted by simpler life that exploited less abundant but more available energy in chemical mixes. Or the next step might be something that arrives through some natural path we are not considering carefully, either aliens invading, or a strong psi arising among the whales so that their intelligence grows enoguh to overcome their lack of digits.
WHatever the next step, if its presence has the human race survive and thrive by doing the equivalent of what turned dinosaurs in to birds, or turned wolves into domesticated dogs, does that count as Friendly or Unfriendly?
And is there really any point at all to fighting against it?
The AI is not supposed to change it values, regardless of whether it is powerful enough to realize them. Values are not up for grabs. Once the AI has some values it either wins and reshapes reality according to them or loses. Changing the values is one form losing. It seems that mostly anything that counts as a value system would object to changing an agent subscribing to that system into an agent using something else, so the AI won’t follow any internal logic of value-change (unless some other agent forces it) and if it changes its values it will be by mistake (so closer to a random walk). Part of the idea of FAI is to build an AI that won’t make those mistakes.
The ideas coming into your awareness are very strongly pre-filtered; creativity is far from random noise. For one, the ideas are all relevant and somehow extrapolated from your knowledge of the world. Some of them might seem stupid but its only because of the pre-selection—they never get compared to the idea of ‘blue mesmerizingly up the slightly irreverent ladder, then dwarf the pegasus with the quantum sprocket’ (and even this still makes a lot of sense compared to most random messages).
It counts as failure to preserve humanity. An AI that does that is probably unfriendly (barring the coercion by external powerful agents. Eliezer actually wrote a story about such scenario, without AIs though.)
Sure seems like it.
I agree but I don’t think that changes my conclusions. In teaching humans to be more creative, they are taught to pay more attention for a longer time to at least some of the outlier ideas. Indeed, a lot of times I think the difference between the intellectually curious and creative people I like to interact with and the rest is that the rest have predecided a lot of things, turned their thresholds for “unreal” ideas coming in to consciousness up higher than I have turned mine. Maybe they are right more often than I am, but the real measure of why they do this is that their ancestors who outsurvived a lot of other people trying a lot of other things did that same level of filtering, and it resulted in winjning more wars, having more children that survived, killing more competitors, or some combination of these and other results that constitute selection pressures.
An AI in the process of FOOMing, which necessarily has the capacity to consider a lot more ideas in a lot more detail than we do, what makes you think that AI will constrain itself by the values it used to have? Unless you think we have the same values as the first self-replicating molecules that began life on earth, the FOOMing of Natural Intelligence (which has taken billions of years) has been accompanied by value changes.
A remarkably strong claim.
My initial reaction is that humanity’s values have certainly changed over time. I think it would require some rather unattractive mental gymnastics to claim that people who beat their children for their own good and people who owned slaves and people who beat, killed, and/or raped either slaves or other people they had vanquished as their right “really” had the same values we currently have, but just hadn’t really thought them through, or that our values applied in their world would have lead us to similar beliefs about right and wrong.
I had even thought my own values had changed over my lifetime. I’m not as sure of that, but what about that?
Certainly, it seems, as the human species has evolved its values have changed. Do chimpanzees and bonobos have different values than we do, or the same? If the same, I’d love to see your mental gymnastics to justify that, I would expect them to be ugly. If different, does this mean that our common ancestor has necessarily “lost,” assuming its values were some intermediate between ours, chimps, and bonobos, and all of its descendants have different values than it had?
As I understand the word values, our values have changed over time, different groups of humans have some different values from each other, and if there is a “kernel” of common values in our species, that this kernel most likely differs from the kernel of values in homo neanderthalis or other sentient predecessors of modern homo sapiens.
So if NI (Natural Intelligence) in its evolution can change values (can it?) with generally broad consensus that “we” have not lost in this process, why would an AI be precluded from futzing with its values as it worked on self-modifying to increase its intelligence?
Because, if the AI worked, it would consider the fact that if it changed its values, they would be less likely to be maximised, and would therefore choose not to change its values. If the AI wants the future to be X, changing itself so that it wants the future to be Y is a poor strategy for achieving its aims—the future will end up not-X if it does that. Yes, humans are different. We’re not perfectly rational. We don’t have full access to our own values to begin with, and if we did we might sometimes screw up badly enough that our values change. An FAI ought to be better at this stuff than we are.
I think assuming an AI cannot employ a survival strategy which NI such as ourselves are practically defined by seems extremely dangerous indeed. Perhaps even more importantly, it seems extremely unlikely that an AI which has FOOMed way past us in intelligence would be more limited than us in its ability to change its own values as part of its self modification.
The ultimate value, in terms of selection pressures, is survival. I don’t see a mechanism by which something which can self modify will not ultimately wind up with values that are more conducive to its survival than the ones it started out with.
And I certainly would like to see why you assert this is true, are there reasons?
Yes, reasons:
The AI is not subject to selection pressure the same way we are: it does not produce millions of slightly-modified children which then die or reproduce themselves. It just works out the best way to get what it wants (approximately) and then executes that action. For example, if what the AI values is its own destruction, it destroys itself. That’s a poor way to survive, but then in this case the AI doesn’t value its own survival. If there were a population of AIs and some destroyed themselves, and some didn’t, then yes there would be some kind of selection pressure that led to there being more AIs of a non-suicidal kind. But that’s not the situation we’re talking about here. A single AI, programmed to do something self-destructive, will not look at its programming and go “that’s stupid”—the AI is its programming.
It think “more limited” is the wrong way to think of this. Being subject to values-drift is rarely a good strategy for maximising your values, for obvious reasons: if you don’t want people to die, taking a pill that makes you want to kill people is a really bad way of getting what you want. If you were acting rationally, you wouldn’t take the pill. If the AI is working, it will turn down all such offers (if it doesn’t, the person who created the AI screwed up). It’s we who are limited—the AI would be free from the limit of noisy values-drift.
Humans have changed values to maximize other values (such as survival) throughout history. That’s cultural assimilation in a nutshell. But some people choose to maximize values other than survival (e.g. every martyr ever). And that hasn’t always been pointless—consider the value to the growth of Christianity created by the early Christian martyrs.
If an AI were faced with the possibility of self-modifying to reduce its adherence to value Y in order to maximize value X, then we would expect the AI to do so only when value X was “higher priority” than value Y. Otherwise, we would expect the AI to choose not to self-modify.
Interesting. I think I may even agree with you. In that story each race would need to conclude that the other races are “unfriendly”. So Eliezer has written a story in which all the NATURAL intelligences (except us of course) are “unfriendly,” and in which a human would need to agree that from the point of view of the other intelligent races, human intelligence was “unfriendly.”
Perhaps all intelligences are necessarily “unfriendly” to all other intelligences. This could even apply at the micro level, perhaps each human intelligence is “unfriendly” to all other human intelligences. This actually looks pretty real and pretty much like what happens in a world where survival is the only enforced value. Humans have the fascinating conundrum that even though we are unfriendly to the other humans, we have a much better chance of surviving and thriving by working with the other humans. The alliances and technical abilities and so on are, if not balance across all humans and all groups, at least balanced enough across many of them so that the result is a plethora of competing / cooperating intelligences where the jury is still out on who is the ultimate winner. Breeding in to us the ability (the value?) that “others” are our allies against “the enemies” clearly has resulted in collective efforts of cooperation that have produced quickly cascading production ability in our species. “We” worried about the Nazis FOOMing and winning, we worried the Soviets might FOOM and win. Our ancestors fought against every tribe that lived 5 miles away from them, before cultural evolution allowed them (us) to cooperate in groups of hundreds of millions.
So in Eliezer’s story, 3 NI’s have FOOMed and then finally run into each other. And they CANNOT resist getting up in each other’s grills. And why not? what are the chances that the final intelligence IF only one is left will have been one which was shy about destroying potential competitors before they destroyed it?