What is wrong with the statement? The idea I’m trying to portray is that I as a person now, cannot go and forcefully rewire another person’s values. The only ability I have to try an affect them is to be persuasive in argument or perhaps being deceptive about certain things to try and get them to a different position (e.g., consider the state of politics).
In contrast, one of the concerns for the future is that an AI may have the technological ability to more directly manipulate a person. So the question I’m asking is: is the future technology at the disposal of an AI the only reason it could behave “badly?” under such a utility function?
Also, please avoid such comments. I am interested in having this discussion, but alluding to finding something wrong in what I have posted and not saying what you think it is, is profoundly unhelpful and useless to discussion.
And yet humanity is resistant to large scale effects because we also combat changes in values that are destructive (like nazism). Are you suggesting that through persuasive means an AI could convert the values of all humanity to something unsavory? I think this is a bit too negative a view on humanity. You might suggest conditioning from birth, but this will result in outrage from the rest of humanity which the AI, by our utility definition, is trying to avoid.
I don’t think in context that’s terribly relevant. The point that it was a large scale result. That it could end up having been much worse when technology got better doesn’t really impact the point.
You make my point right there. World War 2. We went to war in defiance of nazis and refused to be assimilated. People in Germany didn’t even like what the nazis were doing. And finally, the nazis didn’t care about our outrage and death in the resulting war. An AI trying to maximize well-being, will care profoundly about that, by definition.
You seem to think that you are living in a magical fair universe. Just because nothing really really bad happened to you/us yet, doesn’t mean it cannot.
I don’t think I live in a fair universe at all. Regardless, acknowledging that we don’t live in a fair universe doesn’t support your claim that an AI would be able to radically change the values of all humans on earth without outrage from others through persuasion alone.
I feel like I’ve already responded to this argument multiple times in various other responses I’ve made. If you think there’s something I’ve overlooked in those responses let me know, but this seems like a restatement of things I’ve already addressed. Also, if there is something in one of the responses I’ve made with which you disagree and have a different reason than what’s been presented, let me know.
The point that other humans fought against it doesn’t change the central point that a very large fraction of humans could have a radically different effective morality. Moreover, if Germany hadn’t gone to war but had instead done the exact same thing to its internal minorities, most of the world likely would not have intervened.
If you don’t like this example so much, one can just look at changing attitudes on many issues. See for example Pinker’s book “The Better Angels of Our Nature” where he documents extreme changes in historical attitudes about the ethics of violence. For example, war is considered much more of a negative now than it was a few centuries ago. Going to war to gain territory is essentially unthinkable today. Similarly, attitudes about animals have changed a lot. In the Middle Ages, forms of entertainment that were considered normal included not just bear bating and similar actions but such crude behavior as lighting a cat on fire and seeing how long it took to die. Our moral attitudes are very much a product of our culture and how we are raised.
Most of our changes to where we are now seem to be a result of what works better in complex society and I therefore have difficulty accepting that a society in the highly advanced state it would be in by the time we had strong AI could be pushed to a non-productive doomsday set of values. So lets make the argument more clear then: what set of values do you think the AI could push us to through persuasion that would be effectively what we consider a doomsday scenario while and allowed the AI to more easily satisfy well-being?
I’m not sure why running a complex society needs to be a condition. If we all revert to hunter-gatherers then it still satisfies the essential conditions.
That’s a problem even if it isn’t a doomsday scenario. Changes in animal welfare attitudes would probably make most of us unhappy, but having a society where torturing cute animals to death wouldn’t hurt running a complex society. Similarly, allowing infanticide would work fine (heck for that one I can think of some pretty decent arguments why we should allow it). And while not a doomsday scenario, other scenarios that could suck a lot can also be constructed. For example, you could have a situation where we’re all stuck with 1950s gender roles. That would be really bad but wouldn’t destroy a complex society.
Hunter gathers is not something sustainable for a large scale complex society. It is not a position we would favor at all and I’m struggling to see why an AI would try to make us value that set up or how you think a society with technology strong enough to make strong AI would be able to be convinced to it.
Views of killing animals is more flexible as the reason humans object to it seems to come from a level of innate compassion for life itself. So I could see that value being more manipulatable as a result. I don’t see what that has to do with a doomsday set of values though.
1950s gener roles were abandoned because (1) women didn’t like it (in which case maximizing people’s well being would suggest not having such gender roles) and (2) it was less productive for society in that suppressing women limits the set of contributions to society.
I don’t think you’ve presented here a set of doomsday values to which humans could be manipulated to holding by persuasion alone or demonstrated why they would be a set of values the AI would prefer humans to have for maximization.
And yet humanity is resistant to large scale effects because we also combat changes in values that are destructive (like nazism).
Ah implicit belief in moral progress (or at least values and morality being preserved) and a universe where really bad things can’t happen. I sometimes miss that.
The Nazis where defeated not because they where destructive but because the Soviet Union, UK and the US where stronger. Speaking of Stalin, how does Communism fit into your model?
Lets lose the silly straw man arguments. I’ve already explicitly commented on how I don’t believe the universe is fair and I think from that it should be obvious that I don’t think really bad things can’t happen. As far as moral progress goes, I think it happens in so far as its functional. Morals that lead to more successful societies win the competition and stick around. This often happens to move societies (not necessarily all people in the society) toward greater tolerance of peoples and less violence because oppressing people and allowing for more violence tends to have bad effects internally in the society.
If we were weaker the Nazis could have won. That’s not even the central point though. For kicks, lets assume the Nazis would have won the war. What does that mean though? It still means that other humans were is huge opposition and went to war over it causing enumerable deaths. After the nazis won, there would also surely be people wildly unhappy with the situation. This presents a serious problem for the AI trying to maximize well-being. It would not want to do things that led to mass outrage and opposition because that fails its own metrics.
Considering there were many people in germany who vehemently disliked the nazis too (even ignoring jews), it seems like a pretty safe bet that after being conquered we wouldn’t have suddenly viewed the nazis as great people. Why do you think otherwise?
The Japanese are rather fond of America, if I am not mistaken. I assume that it is not uncommon for the conquered to eventually grow satisfied with their conquerors.
Americans did rule Japan by military force for about five years after WWII ended, demilitarized the nation, and left behind a sympathetic government of American design. However, if you do not wish to use the word ‘conquer’ to describe such a process, that is your prerogative.
When you think of a nation conquering another, the US and Japan is really what comes to your mind? Are you honestly having trouble grasping the distinction I was making? Because personally, I’m really not interested in continuing an irrelevant semantics debate.
The US ran the Japanese government for a period of several years. I think you mean to add something about “run the country without intent to transfer power back to the locals”.
A confession: Often when reading LW, I will notice some claim that seems wrong, and will respond, without reading the thread context carefully or checking back far enough to understand exactly why a given point came up. This results in an inadvertent tendency to nit-pick, for which I apologize.
I appreciate that sentiment and I’ll also add that I appreciate that even in your prior post you made an effort to suggest what you thought I was driving at.
And as for the others? Or are you saying the AI trying to maximize well-being will try and succeed in effectively wiping out everyone and then condition future generations to have the desired easily maximized values? If so, this behavior is conditioned on the idea that the AI could be very confident in its ability to do so, because otherwise the chance of failing and the cost of war in expected value of human well-being would massively drop the expected value. I think you should also make clear what you think these values might end up being to which it will try to change human values to more easily maximize.
It’s irrelevant. In a world of world-destroying technologies, a really bad thing happening for only a small amount of time is all it takes. The Cold War wasn’t even close to the horror of Nazi domination (probably)--there were still lots of happy people with decent governments in the west! But everyone still could have died.
What if Nazis had developed nuclear weapons? What if the AI self-reproduces, without self-improving, such that the Big Bad they’re supporting has an army of super-efficient researchers and engineers? What if they had gotten to the hydrogen bomb around the same time the US had gotten the atom bomb? What if the Big Bad develops nanomachines, programmable to self-replicate and destroy anyone who opposes, or who passes a certain boundary? What kind of rebellion or assassination attempt could stand up to that? What if the humans want the AI, rather than another human, to be the leader of their Big Bad Movement, making their evil leader both easily replicable and immune to nanomachine destruction?
Hell, what if the AI gets no more competent or powerful than a human? It can still, in the right position, start a thermonuclear war, just the same as high-level weapons techs or—hell!--technical errors can. Talented spies can make it to sufficiently high levels of government operation; why couldn’t a machine do so? Or hire a spy to do so?
And if the machine thinks that’s the best way to make people happy (for whatever horrible reason—perhaps it is convinced by the Repugnant Conclusion and wants to maximize utility by wiping out all the immiserated Russians), we’re still in trouble.
However, if you’re trying to describe an AI that is set to maximize human value, understands the complexities of the human mind, and won’t make such mistakes, then you are describing friendly AI.
Edit: In other words, I contend that the future threat of General AI is not in modifying humans with nanotechnology. It is in simple general ability to shape the world, even if that only means manipulating objects using current technologies. If we’re defining “intelligence” as the ability to manipulate atoms to shape the world according to our bliss points, a machine that can think thousands of times faster than humans will be able to do so at least hundreds of times better than humans. This is especially true if it can replicate, which, given this hypothesis, it will almost certainly be able to. If we add intelligence explosion to the mix, we’re in big trouble.
You’re missing the point of talking about opposition. The AI doesn’t want the outcome of opposition because that has terrible effects on the well-being its trying to maximize, unlike the nazis. This isn’t about winning the war, its about the consequence of war on the measured well-being of people and other people who live in a society where an AI would kill people for what amounted to thought-crime.
And if the machine thinks that’s the best way to make people happy (for whatever horrible reason—perhaps it is convinced by the Repugnant Conclusion and wants to maximize utility by wiping out all the immiserated Russians), we’re still in trouble.
This specifically violates the assumption that the AI has well modeled how any given human measures their well-being.
However, if you’re trying to describe an AI that is set to maximize human value, understands the complexities of the human mind, and won’t make such mistakes, then you are describing friendly AI.
It is the assumption that it models human well-being at least as well as the best a human can model the well-being function of another. However, this constraint by itself does not solve friendly AI, because in a less constrained problem than the one I outlined, the most common response for an AI trying to maximize what humans value is that it will change and rewire what humans value to something more easy to maximize. The entire purpose of this post is to question whether it could achieve this without the ability to manually rewire human values (e.g., could this be done through persuasion?). In other words, you’re claiming friendly AI is solved more easily than the constrained question I posed in the post.
Are you trying to argue that, of all the humans who have done horrible horrible things, not a single one of them 1) modeled other humans at the average or above-average level that humans usually model each other, and 2) not a single one of them thought they were trying to make the world better off? Or are you trying to argue that not a single one of them ever caused an existential threat?
My guess is that Lenin, for instance, had an above-average human-modeling mind and thought he was taking the first steps of bringing the whole world into a new prosperous era free from class war and imperialism. And he was wrong and thousands of people died. The kulaks opposed, in the form of destroying their farms. Lenin probably didn’t “want the outcome of opposition,” but that didn’t stop him from thinking mass slaughter was the solution.
The ability to model the well-being of humans and the “friendliness” of the AI are the same thing, provided the AI is programmed to maximize that well-being value. If your AI can’t ever make mistakes like that, it’s a friendly AI. If it can, it’s trouble whether or not it can alter human values.
Consider that every human who ever existed, was shaped purely by environment + genes.
Consider how much humans have achieved merely by controlling the environment: converting people to insane religions which they are willing to die and kill for, making torturers, “the banality of evil”, etc. etc.
Now imagine what an entity could achieve with that plus 1) complete understanding of how the brain is shaped by the environment and/or 2) complete control of the environment (via VR, smart dust, whatever) for a human from age 0 onwards.
I think the conservative assumption is that any mind we would recognize as human, and many we wouldn’t, could be produced by such an optimization process. You’re not limiting your AI at all.
But the AI isn’t being dropped into a completely undeveloped society. It will be dropped into an extremely developed society with values already existing. If the AI were dropped back into the era of early man, I could see major concern. I don’t see humanity having the values we’ve developed being radically and entirely changed into something we consider so unsavory by persuasion alone. That doesn’t mean no one could be affected, but I can’t see such a thing going down without outrage from large sects of humanity; which is not what the AI wants.
You underestimate “persuasion alone”. Please consider that (by your definition) all human opinions on all subjects that have existed to date, have been created pretty much “by persuasion alone”.
Also, I don’t want to live in a world where what I’m allowed to do or be is constrained by whether it provokes “outrages from large sects of humanity”. There are plenty of sects (properly so called ;-) today that don’t want me to continue existing even the way I already am, at least not without major brainwashing.
All human opinions cannot be created by persuasion alone because opinions have to start somewhere. People can and do think for themselves and that’s what creates opinions. Then they might persuade people to have these opinions as well, but clearly persuasion is not the sole source and even then it’s not like persuasion is a one-way process where you hit the persuade button and the other person is switched. It seems that your argument is that any human can be persuaded to any opinion at any time and I just can’t buy that. Humans are malleable and we’ve made a huge number of mistakes in the past, but I don’t see us as so bad that anyone can have their mind changed to anything regardless of the merit behind it. This entire site is based around getting people to not be arbitrarily malleable and to require rationality in making decisions—that there are objective conclusions and we should strive for them. Is this site and community a failure then? Are all of the people subject to mere persuasion in spite of rationality and cannot think for themselves?
Regarding actions that cause outrage I never said you were constrained by the outrage of others. I said an AI that maximizes human well-being is not going to take actions that cause extreme outrage.
All human opinions cannot be created by persuasion alone because opinions have to start somewhere. People can and do think for themselves and that’s what creates opinions.
This is completely wrong. Again, you give “persuasion” a very narrow scope.
A baby is born without language, certainly without many opinions. It can be shaped by its environment (“persuasion”) to be almost anything. Certainly, very few of the extremely diverse cultures and sub-cultures known from history have had any trouble raising their kids to behave like the adults, with only a small typical proportion of adolescents who left for another society. And these people had no understanding of how the brain really works—unlike what a superintelligent AI might have.
Short version: it doesn’t matter if people do think for themselves, because they only get to think about their sensory inputs and the AI can control those. Even a perfect Bayesian superintelligence would reach any conclusion you wished if you truly fully controlled all the information it ever received (as long as it had no priors of 0 or 1).
This entire site is based around getting people to not be arbitrarily malleable and to require rationality in making decisions [...] Is this site and community a failure then?
If you end up in an environment controlled by an unfriendly AI, having read this site won’t help you; it’s game over. LW rationality skills work in some worlds, not in any possible world.
Regarding actions that cause outrage I never said you were constrained by the outrage of others. I said an AI that maximizes human well-being is not going to take actions that cause extreme outrage.
How is this different from saying it’s not going to let me take actions that cause extreme outrage? I hope you aren’t planning on building an AI that has a sense of personal responsibility and doesn’t care if humans subvert its utility function as long as it didn’t cause them to do so.
There is a profound difference between being persuasive and manipulating all sensory input of a human. Is your argument not that it would try to persuade but that an AI would hook up all humans to a computer that controlled everything we perceived? If you want to make that your argument, I’m game for discussing it, but I think it should be made clear that this is a very different argument than an AI trying to change people’s minds through persuasion. But lets discuss it. This suggestion of manipulating the senses of humans seems to imply a massive use of technology and integration of the technology by the AI not available today, but that’s okay, we should expect technology to improve incredibly by the time we can make strong AI. But so long as we’re assuming that such huge amounts of improved technology with large integration is available and would allow the AI to pull the wool over everyone’s eyes, we must also consider that humans have made use of that technology themselves to better themselves and provide wildly intelligent computer security systems such that it seems a stretch to me to posit that an AI could do this without anyone noticing.
How is this different from saying it’s not going to let me take actions that cause extreme outrage? I hope you aren’t planning on building an AI that has a sense of personal responsibility and doesn’t care if humans subvert its utility function as long as it didn’t cause them to do so.
I suppose if your actions were extreme enough in the outrage they caused we might make a case for those actions needing to be thwarted, even by the reasoning of the AI. I don’t know you, but my guess is you’re thinking perhaps of religious fundamentalists feelings about you? Such outrage on its own is (1) somewhat limited and counterbalanced by others and (2) counter productive for humanity to act upon in which case the better argument is not to thwart your actions but work toward behavior of tolerance. But lets contrast this to an AI trying to effectively replace mankind with easily satisfied humans and consider how people would respond to that. I think its clear that humans would work toward shutting such an AI down and would respond with extreme concern for their livelihood. The fact that we’re sitting her talking about how this is doomsday scenario seems to be evidence of that concern. Given that, it just doesn’t seem to be in the AIs interest to make that choice; it would simply cause too much of a collapse in the well-being of humanity with their profound concern for the situation.
What is wrong with the statement? The idea I’m trying to portray is that I as a person now, cannot go and forcefully rewire another person’s values. The only ability I have to try an affect them is to be persuasive in argument or perhaps being deceptive about certain things to try and get them to a different position (e.g., consider the state of politics).
In contrast, one of the concerns for the future is that an AI may have the technological ability to more directly manipulate a person. So the question I’m asking is: is the future technology at the disposal of an AI the only reason it could behave “badly?” under such a utility function?
Also, please avoid such comments. I am interested in having this discussion, but alluding to finding something wrong in what I have posted and not saying what you think it is, is profoundly unhelpful and useless to discussion.
Consider that humans have modified human values to results as different as nazism and as jainism.
And yet humanity is resistant to large scale effects because we also combat changes in values that are destructive (like nazism). Are you suggesting that through persuasive means an AI could convert the values of all humanity to something unsavory? I think this is a bit too negative a view on humanity. You might suggest conditioning from birth, but this will result in outrage from the rest of humanity which the AI, by our utility definition, is trying to avoid.
If you don’t think world war 2 was a large scale effect, then I don’t know what to say to you.
Compared to what the effects of total war between large powers just 20 years later would have been like, WWII was a relatively small-scale effect.
I don’t think in context that’s terribly relevant. The point that it was a large scale result. That it could end up having been much worse when technology got better doesn’t really impact the point.
You make my point right there. World War 2. We went to war in defiance of nazis and refused to be assimilated. People in Germany didn’t even like what the nazis were doing. And finally, the nazis didn’t care about our outrage and death in the resulting war. An AI trying to maximize well-being, will care profoundly about that, by definition.
You seem to think that you are living in a magical fair universe. Just because nothing really really bad happened to you/us yet, doesn’t mean it cannot.
I don’t think I live in a fair universe at all. Regardless, acknowledging that we don’t live in a fair universe doesn’t support your claim that an AI would be able to radically change the values of all humans on earth without outrage from others through persuasion alone.
Humans can radically change the values of humans through weak social pressure alone.
I feel like I’ve already responded to this argument multiple times in various other responses I’ve made. If you think there’s something I’ve overlooked in those responses let me know, but this seems like a restatement of things I’ve already addressed. Also, if there is something in one of the responses I’ve made with which you disagree and have a different reason than what’s been presented, let me know.
The point that other humans fought against it doesn’t change the central point that a very large fraction of humans could have a radically different effective morality. Moreover, if Germany hadn’t gone to war but had instead done the exact same thing to its internal minorities, most of the world likely would not have intervened.
If you don’t like this example so much, one can just look at changing attitudes on many issues. See for example Pinker’s book “The Better Angels of Our Nature” where he documents extreme changes in historical attitudes about the ethics of violence. For example, war is considered much more of a negative now than it was a few centuries ago. Going to war to gain territory is essentially unthinkable today. Similarly, attitudes about animals have changed a lot. In the Middle Ages, forms of entertainment that were considered normal included not just bear bating and similar actions but such crude behavior as lighting a cat on fire and seeing how long it took to die. Our moral attitudes are very much a product of our culture and how we are raised.
Most of our changes to where we are now seem to be a result of what works better in complex society and I therefore have difficulty accepting that a society in the highly advanced state it would be in by the time we had strong AI could be pushed to a non-productive doomsday set of values. So lets make the argument more clear then: what set of values do you think the AI could push us to through persuasion that would be effectively what we consider a doomsday scenario while and allowed the AI to more easily satisfy well-being?
I’m not sure why running a complex society needs to be a condition. If we all revert to hunter-gatherers then it still satisfies the essential conditions.
That’s a problem even if it isn’t a doomsday scenario. Changes in animal welfare attitudes would probably make most of us unhappy, but having a society where torturing cute animals to death wouldn’t hurt running a complex society. Similarly, allowing infanticide would work fine (heck for that one I can think of some pretty decent arguments why we should allow it). And while not a doomsday scenario, other scenarios that could suck a lot can also be constructed. For example, you could have a situation where we’re all stuck with 1950s gender roles. That would be really bad but wouldn’t destroy a complex society.
Hunter gathers is not something sustainable for a large scale complex society. It is not a position we would favor at all and I’m struggling to see why an AI would try to make us value that set up or how you think a society with technology strong enough to make strong AI would be able to be convinced to it.
Views of killing animals is more flexible as the reason humans object to it seems to come from a level of innate compassion for life itself. So I could see that value being more manipulatable as a result. I don’t see what that has to do with a doomsday set of values though.
1950s gener roles were abandoned because (1) women didn’t like it (in which case maximizing people’s well being would suggest not having such gender roles) and (2) it was less productive for society in that suppressing women limits the set of contributions to society.
I don’t think you’ve presented here a set of doomsday values to which humans could be manipulated to holding by persuasion alone or demonstrated why they would be a set of values the AI would prefer humans to have for maximization.
Ah implicit belief in moral progress (or at least values and morality being preserved) and a universe where really bad things can’t happen. I sometimes miss that.
The Nazis where defeated not because they where destructive but because the Soviet Union, UK and the US where stronger. Speaking of Stalin, how does Communism fit into your model?
Lets lose the silly straw man arguments. I’ve already explicitly commented on how I don’t believe the universe is fair and I think from that it should be obvious that I don’t think really bad things can’t happen. As far as moral progress goes, I think it happens in so far as its functional. Morals that lead to more successful societies win the competition and stick around. This often happens to move societies (not necessarily all people in the society) toward greater tolerance of peoples and less violence because oppressing people and allowing for more violence tends to have bad effects internally in the society.
If we were weaker the Nazis could have won. That’s not even the central point though. For kicks, lets assume the Nazis would have won the war. What does that mean though? It still means that other humans were is huge opposition and went to war over it causing enumerable deaths. After the nazis won, there would also surely be people wildly unhappy with the situation. This presents a serious problem for the AI trying to maximize well-being. It would not want to do things that led to mass outrage and opposition because that fails its own metrics.
Consider what we think we know of the Nazis, are you sure about this one?
Considering there were many people in germany who vehemently disliked the nazis too (even ignoring jews), it seems like a pretty safe bet that after being conquered we wouldn’t have suddenly viewed the nazis as great people. Why do you think otherwise?
The Japanese are rather fond of America, if I am not mistaken. I assume that it is not uncommon for the conquered to eventually grow satisfied with their conquerors.
We also didn’t conquer Japan, we won the war. Those are two different things.
What sort of things would be different if it were the case that America conquered Japan?
Conquer is typically used to mean that you take over the government and run the country, not just win a war.
Americans did rule Japan by military force for about five years after WWII ended, demilitarized the nation, and left behind a sympathetic government of American design. However, if you do not wish to use the word ‘conquer’ to describe such a process, that is your prerogative.
When you think of a nation conquering another, the US and Japan is really what comes to your mind? Are you honestly having trouble grasping the distinction I was making? Because personally, I’m really not interested in continuing an irrelevant semantics debate.
The US ran the Japanese government for a period of several years. I think you mean to add something about “run the country without intent to transfer power back to the locals”.
Yes. I find it odd that this argument is derailed into demanding a discussion on the finer points of the semantics for “conquer.”
A confession: Often when reading LW, I will notice some claim that seems wrong, and will respond, without reading the thread context carefully or checking back far enough to understand exactly why a given point came up. This results in an inadvertent tendency to nit-pick, for which I apologize.
I appreciate that sentiment and I’ll also add that I appreciate that even in your prior post you made an effort to suggest what you thought I was driving at.
For the Poles at least I fear it probable not many would be around say 20 years after victory.
And as for the others? Or are you saying the AI trying to maximize well-being will try and succeed in effectively wiping out everyone and then condition future generations to have the desired easily maximized values? If so, this behavior is conditioned on the idea that the AI could be very confident in its ability to do so, because otherwise the chance of failing and the cost of war in expected value of human well-being would massively drop the expected value. I think you should also make clear what you think these values might end up being to which it will try to change human values to more easily maximize.
It’s irrelevant. In a world of world-destroying technologies, a really bad thing happening for only a small amount of time is all it takes. The Cold War wasn’t even close to the horror of Nazi domination (probably)--there were still lots of happy people with decent governments in the west! But everyone still could have died.
What if Nazis had developed nuclear weapons? What if the AI self-reproduces, without self-improving, such that the Big Bad they’re supporting has an army of super-efficient researchers and engineers? What if they had gotten to the hydrogen bomb around the same time the US had gotten the atom bomb? What if the Big Bad develops nanomachines, programmable to self-replicate and destroy anyone who opposes, or who passes a certain boundary? What kind of rebellion or assassination attempt could stand up to that? What if the humans want the AI, rather than another human, to be the leader of their Big Bad Movement, making their evil leader both easily replicable and immune to nanomachine destruction?
Hell, what if the AI gets no more competent or powerful than a human? It can still, in the right position, start a thermonuclear war, just the same as high-level weapons techs or—hell!--technical errors can. Talented spies can make it to sufficiently high levels of government operation; why couldn’t a machine do so? Or hire a spy to do so?
And if the machine thinks that’s the best way to make people happy (for whatever horrible reason—perhaps it is convinced by the Repugnant Conclusion and wants to maximize utility by wiping out all the immiserated Russians), we’re still in trouble.
However, if you’re trying to describe an AI that is set to maximize human value, understands the complexities of the human mind, and won’t make such mistakes, then you are describing friendly AI.
Edit: In other words, I contend that the future threat of General AI is not in modifying humans with nanotechnology. It is in simple general ability to shape the world, even if that only means manipulating objects using current technologies. If we’re defining “intelligence” as the ability to manipulate atoms to shape the world according to our bliss points, a machine that can think thousands of times faster than humans will be able to do so at least hundreds of times better than humans. This is especially true if it can replicate, which, given this hypothesis, it will almost certainly be able to. If we add intelligence explosion to the mix, we’re in big trouble.
You’re missing the point of talking about opposition. The AI doesn’t want the outcome of opposition because that has terrible effects on the well-being its trying to maximize, unlike the nazis. This isn’t about winning the war, its about the consequence of war on the measured well-being of people and other people who live in a society where an AI would kill people for what amounted to thought-crime.
This specifically violates the assumption that the AI has well modeled how any given human measures their well-being.
It is the assumption that it models human well-being at least as well as the best a human can model the well-being function of another. However, this constraint by itself does not solve friendly AI, because in a less constrained problem than the one I outlined, the most common response for an AI trying to maximize what humans value is that it will change and rewire what humans value to something more easy to maximize. The entire purpose of this post is to question whether it could achieve this without the ability to manually rewire human values (e.g., could this be done through persuasion?). In other words, you’re claiming friendly AI is solved more easily than the constrained question I posed in the post.
Are you trying to argue that, of all the humans who have done horrible horrible things, not a single one of them 1) modeled other humans at the average or above-average level that humans usually model each other, and 2) not a single one of them thought they were trying to make the world better off? Or are you trying to argue that not a single one of them ever caused an existential threat?
My guess is that Lenin, for instance, had an above-average human-modeling mind and thought he was taking the first steps of bringing the whole world into a new prosperous era free from class war and imperialism. And he was wrong and thousands of people died. The kulaks opposed, in the form of destroying their farms. Lenin probably didn’t “want the outcome of opposition,” but that didn’t stop him from thinking mass slaughter was the solution.
The ability to model the well-being of humans and the “friendliness” of the AI are the same thing, provided the AI is programmed to maximize that well-being value. If your AI can’t ever make mistakes like that, it’s a friendly AI. If it can, it’s trouble whether or not it can alter human values.
Consider that every human who ever existed, was shaped purely by environment + genes.
Consider how much humans have achieved merely by controlling the environment: converting people to insane religions which they are willing to die and kill for, making torturers, “the banality of evil”, etc. etc.
Now imagine what an entity could achieve with that plus 1) complete understanding of how the brain is shaped by the environment and/or 2) complete control of the environment (via VR, smart dust, whatever) for a human from age 0 onwards.
I think the conservative assumption is that any mind we would recognize as human, and many we wouldn’t, could be produced by such an optimization process. You’re not limiting your AI at all.
But the AI isn’t being dropped into a completely undeveloped society. It will be dropped into an extremely developed society with values already existing. If the AI were dropped back into the era of early man, I could see major concern. I don’t see humanity having the values we’ve developed being radically and entirely changed into something we consider so unsavory by persuasion alone. That doesn’t mean no one could be affected, but I can’t see such a thing going down without outrage from large sects of humanity; which is not what the AI wants.
You underestimate “persuasion alone”. Please consider that (by your definition) all human opinions on all subjects that have existed to date, have been created pretty much “by persuasion alone”.
Also, I don’t want to live in a world where what I’m allowed to do or be is constrained by whether it provokes “outrages from large sects of humanity”. There are plenty of sects (properly so called ;-) today that don’t want me to continue existing even the way I already am, at least not without major brainwashing.
All human opinions cannot be created by persuasion alone because opinions have to start somewhere. People can and do think for themselves and that’s what creates opinions. Then they might persuade people to have these opinions as well, but clearly persuasion is not the sole source and even then it’s not like persuasion is a one-way process where you hit the persuade button and the other person is switched. It seems that your argument is that any human can be persuaded to any opinion at any time and I just can’t buy that. Humans are malleable and we’ve made a huge number of mistakes in the past, but I don’t see us as so bad that anyone can have their mind changed to anything regardless of the merit behind it. This entire site is based around getting people to not be arbitrarily malleable and to require rationality in making decisions—that there are objective conclusions and we should strive for them. Is this site and community a failure then? Are all of the people subject to mere persuasion in spite of rationality and cannot think for themselves?
Regarding actions that cause outrage I never said you were constrained by the outrage of others. I said an AI that maximizes human well-being is not going to take actions that cause extreme outrage.
This is completely wrong. Again, you give “persuasion” a very narrow scope.
A baby is born without language, certainly without many opinions. It can be shaped by its environment (“persuasion”) to be almost anything. Certainly, very few of the extremely diverse cultures and sub-cultures known from history have had any trouble raising their kids to behave like the adults, with only a small typical proportion of adolescents who left for another society. And these people had no understanding of how the brain really works—unlike what a superintelligent AI might have.
Short version: it doesn’t matter if people do think for themselves, because they only get to think about their sensory inputs and the AI can control those. Even a perfect Bayesian superintelligence would reach any conclusion you wished if you truly fully controlled all the information it ever received (as long as it had no priors of 0 or 1).
If you end up in an environment controlled by an unfriendly AI, having read this site won’t help you; it’s game over. LW rationality skills work in some worlds, not in any possible world.
How is this different from saying it’s not going to let me take actions that cause extreme outrage? I hope you aren’t planning on building an AI that has a sense of personal responsibility and doesn’t care if humans subvert its utility function as long as it didn’t cause them to do so.
There is a profound difference between being persuasive and manipulating all sensory input of a human. Is your argument not that it would try to persuade but that an AI would hook up all humans to a computer that controlled everything we perceived? If you want to make that your argument, I’m game for discussing it, but I think it should be made clear that this is a very different argument than an AI trying to change people’s minds through persuasion. But lets discuss it. This suggestion of manipulating the senses of humans seems to imply a massive use of technology and integration of the technology by the AI not available today, but that’s okay, we should expect technology to improve incredibly by the time we can make strong AI. But so long as we’re assuming that such huge amounts of improved technology with large integration is available and would allow the AI to pull the wool over everyone’s eyes, we must also consider that humans have made use of that technology themselves to better themselves and provide wildly intelligent computer security systems such that it seems a stretch to me to posit that an AI could do this without anyone noticing.
I suppose if your actions were extreme enough in the outrage they caused we might make a case for those actions needing to be thwarted, even by the reasoning of the AI. I don’t know you, but my guess is you’re thinking perhaps of religious fundamentalists feelings about you? Such outrage on its own is (1) somewhat limited and counterbalanced by others and (2) counter productive for humanity to act upon in which case the better argument is not to thwart your actions but work toward behavior of tolerance. But lets contrast this to an AI trying to effectively replace mankind with easily satisfied humans and consider how people would respond to that. I think its clear that humans would work toward shutting such an AI down and would respond with extreme concern for their livelihood. The fact that we’re sitting her talking about how this is doomsday scenario seems to be evidence of that concern. Given that, it just doesn’t seem to be in the AIs interest to make that choice; it would simply cause too much of a collapse in the well-being of humanity with their profound concern for the situation.