“One way to think about Friendly AI is that it’s an offensive approach to the problem of security (i.e., take over the world), instead of a defensive one.”
Not if the AI itself is vulnerable to penetration. By your own reasoning, we have no reason to think they won’t be. They may turn out to be one of the biggest security liabilities because the way it executes tasks may be very intelligent and there’s no reason to believe they won’t be reprogrammed to do unfriendly things.
Friendly AI is only friendly until a human figures out how to abuse it.
An FAI would have some security advantages. It can achieve physical security by taking over the world and virtualizing everyone else, and ought to also have enough optimization power to detect and fix all the “low level” information vulnerabilities (e.g., bugs in its CPU design or network stack). That still leaves “high level” vulnerabilities, which are sort of hard to distinguish from “failures of Friendliness”. To avoid these, what I’ve advocated in the past is that FAI shouldn’t be attempted until its builders have already improved beyond human intelligence via other seemingly safer means.
(Edit to add some disclaimers, since Epiphany expressed a concern about PR consequences of this comment: Here I was implicitly assuming that virtualizing people is harmless, but I’m not sure about this, and if it’s not, I would prefer the FAI not to virtualize people. Also, I do not work for SIAI nor am I affiliated with them.)
If the builders have increased their intelligence levels that high, then other people of that time will be able to do the same and therefore potentially crack the AI.
Two:
Also, I may as well point out that your argument is based on the assumption that enough intelligence will make for perfect security. It may be that no matter how intelligent the designers are, their security plans are not perfect. Perfect security looks to be about as likely, to me, as perpetual motion is. No matter how much intelligence you throw at it, you won’t get a perpetual motion machine. We’d need to discover some paradigm shattering physics information for that to be so. I suppose it is possible that someone will shatter the physics paradigms by discovering new information, but that’s not something to count on to build a perpetual motion machine, especially when you’re counting on the perpetual motion machine to keep the world safe.
Three:
Whenever humans have tried to collect too much power into one place, it has not worked out for them. For instance, communism in Russia. They thought they’d share all the money by letting one group distribute it. That did not work.
The founding fathers of the USA insisted on checking and balancing the government’s power. Surely you are aware of the reasons for that.
If the builders are the only ones in the world with intelligence levels that high, the power of that may corrupt them, and they may make a pact to usurp the AI themselves.
Four:
There may be unexpected thoughts you encounter in that position that seem to justify taking advantage of the situation. For instance, before becoming a jailor, you would assume you’re going to be ethical and fair. In that situation, though, people change. (See also: Zimbardo’s Stanford prison experiment).
Why do they change? I imagine the reasoning goes a little like this: “Great I’m in control. Oh, wait. Everyone wants to get out. Okay. And they’re a threat to me because I’m keeping them in here. I’m going to get into a lot of power struggles in this job. Considering that even if I fail only 1% of the time, the consequences of failing at a power struggle are very dire, so I should probably err on the side of caution—use too much force rather than too little. And if it’s okay to use physical force, then how bad is using a little psychological oppression as a deterrent? That will be a bit of extra security for me and help me maintain order in this jail. Considering the serious risk, and the high chance of injury, it’s necessary to use everything I’ve got.”
We don’t know what kinds of reasoning processes the AI builders will get into at that time. They might be thinking like this:
“We’re going to make the most powerful thing in the world, yay! But wait, everyone else wants it. They’re trying to hack us, spy on us… there are people out there who would kidnap us and torture us to get a hold of this information. They might do all kinds of horrible things to us. Oh my goodness and they’re not going to stop trying to hack us when we’re done. Our information will still be valuable. I could get kidnapped years from now and be tortured for this information then. I had better give myself some kind of back door into the AI, something that will make it protect me when I need it. (A month later) Well… surely it’s justified to use the back door for this one thing… and maybe for that one thing, too… man I’ve got threats all over me, if I don’t do this perfectly, I’ll probably fail … even if I only make a mistake 1 in 100 times, that could be devastating. (Begins using the back door all the time.) And I’m important. I’m working on the most powerful AI. I’m needed to make a difference in the world. I had better protect myself and err on the side of caution. I could do these preventative things over here… people won’t like the limits I place on them, but the pros outweigh the cons, so: oppress.” The limits may be seen as evidence that the AI builders cannot be trusted (regardless of how justified they are, there will be some group of people who feels oppressed by new limits, possibly irrational people or possibly people who see a need for the freedom that the AI builders don’t) and if a group of people are angry about the limits, they will then be opposed to the AI builders. If they begin to resist the AI builders, the AI builders will be forced to increase security, which may oppress them further. This could be a feedback loop that gets out of hand: Increasing resistance to the AI builders justifies increasing oppression, and increasing oppression justifies increasing resistance.
This is how an AI builder could turn into a jailor.
If part of the goal is to create an AI that will enforce laws, the AI researchers will be part of the penal system, literally. We could be setting ourselves up for the world’s most spectacular prison experiment.
Whoever it is that keeps thumbing down my posts in this thread is invited to bring brutal honesty down onto my ideas, I am not afraid.
If “virtualizing everyone” means what I think you mean by that, that’s a euphemism. That it will achieve physical security implies that the physical originals of those people would not exist after the process—otherwise you’d just have two copies of every person which, in theory, could increase their chances of cracking the AI. It sounds like what you’re saying here is that the “friendly” AI would copy everyone’s mind into a computer system and then kill them.
Maybe it seems to some people like copying your mind will preserve you, but imagine this: Ten copies are made. Do you, the physical original person, experience what all ten copies of you are experiencing at once? No. And if you, the physical original person, ceased to exist, would you continue by experiencing what a copy of you is experiencing? Would you have control over their actions? No.
You’d be dead.
Making copies of ourselves won’t save our lives—that would only preserve our minds.
Now, if you meant something else by “virtualize” I’d be happy to go read about it. After turning up with absolutely no instances of the terms “virtualize people” or “virtualize everyone” on the internet (barring completely different uses like “blah blah blah virtualize. Everyone is blah blah.”) I have no idea what you mean by “virtualize everyone” if it isn’t “copy their minds and then kill their bodies.”
The Worst Argument in the World. This is not a typical instance of “dead”, so the connotations of typical examples of “dead” don’t automatically apply.
Tabooing the word “dead”, I ask myself, if a copy of myself was made, and ran independently of the original, the original continuing to exist, would either physical copy object to being physically destroyed provided the other continued in existence? I believe both of us would. Even the surviving copy would object to the other being destroyed.
Assuming the copy had biochemistry, or some other way of experiencing emotions, the cop(ies) of me would definitely object to what had happened. Alternately, if a virtual copy of me was created and was capable of experiencing, I would feel that it was important for the copy to have the opportunity to make a difference in the world—that’s why I live—so, yes, I would feel upset about my copy being destroyed.
You know, I think this problem has things in common with the individualism vs. communism debate. Do we view the copies as parts of a whole, unimportant in and of themselves, or do we view them all as individuals?
If we were to view them as parts of a whole, then what is valued? We don’t feel pain or pleasure as a larger entity made up of smaller entities. We feel it individually. If happiness for as many life forms as possible is our goal, both the originals and the copies should have rights. If they copies are capable of experiencing pain and pleasure, they need to have human rights the same as ours. I would not see it as ethical to let myself be copied if my copies would not have rights.
Thank you, Vladmir, for your honest criticism, and more is invited. However, this “Worst argument in the world” comparison is not applicable here. In Yvain’s post, he explains:
The opponent is saying “Because you don’t like criminals, and Martin Luther King is a criminal, you should stop liking Martin Luther King.” But King doesn’t share the important criminal features of being driven by greed, preying on the innocent, or weakening the fabric of society that made us dislike criminals in the first place. Therefore, even though he is a criminal, there is no reason to dislike King.
If we do an exercise where we substitute the words “criminal” and “Martin Luther King” with “virtualization” and “death”, and read the sentence that results, I think you’ll see my point:
The opponent is saying “Because you don’t like death, and being virtualized will cause death, you should stop liking the idea of being virtualized by an AGI.” But virtualization doesn’t share the important features of death like not being able to experience anymore and the inability to enjoy the world that made us dislike death in the first place. Therefore, even though being virtualized by an AGI will cause death, there is no reason to dislike virtualization.
Not being able to experience anymore and not being able to enjoy the world are unacceptable results of “being virtualized”. Therefore, we should not like the idea of being virtualized by AGI.
That it will achieve physical security implies that the physical originals of those people would not exist after the process
Yes, that’s what I was thinking when I wrote that, but if the FAI concludes that replacing a physical person with a software copy isn’t a harmless operation, it could instead keep physical humans around and place them into virtual environments Matrix-style.
Um. Shouldn’t we be thinking “how will we get the FAI to conclude that replacing people with software is not harmless” not “If the FAI concludes that this is harmless...”
After all, if it’s doing something that kills people, it isn’t friendly.
To place people into a virtual environment would be to take away their independence. Humans have a need for dignity, and I think that would be bad for them. I think FAI should know better than to do that, too.
Um. Shouldn’t we be thinking “how will we get the FAI to conclude that replacing people with software is not harmless” not “If the FAI concludes that this is harmless...”
If it’s actually a FAI, you should approve of what it decides, not of what people (including yourself) currently believe. If it can’t be relied upon in this manner, it’s not (known to be) a FAI. You know whether it’s a FAI based on its design, not based on its behavior, which it won’t generally be possible to analyze (or do something about).
You shouldn’t be sure about correct answers to object level (very vaguely specified) questions like “Is replacing people with software harmless?”. A FAI should use a procedure that’s more reliable in answering such questions than you or any other human is. If it’s using such a procedure, then what it decides is a more reliable indicator of what the correct decision is than what you (or I) believe. It’s currently unclear how to produce such a procedure.
Demanding that FAI conforms to moral beliefs currently held by people is also somewhat pointless in the sense that FAI has to be able to make decisions about much more precisely specified decision problems, such that humans won’t be able to analyze those decision problems in any useful way, so there are decisions where moral beliefs currently or potentially held by humans don’t apply. If it’s built so as to be able to make such decisions, it will also be able to answer the questions about which there are currently held beliefs, as a special case. If the procedure for answering moral questions is more reliable in general, it’ll also be more reliable for such questions.
Um. Shouldn’t we be thinking “how will we get the FAI to conclude that replacing people with software is not harmless” not “If the FAI concludes that this is harmless...”
A lot of people here think that the quick assumption that unusual substrate changes necessarily imply death isn’t well-founded. Arguing with the assumption that it is obviously true will not be helpful.
There should be a good single post or wiki page to like to about this debate (it also comes up constantly in cryonics discussions), but I don’t think there is one.
This is a pretty long-running debate on LW and you’ve just given the starting argument yet again. One recent response is asking how can you tell sleeping won’t kill you?
Physical walls are superior to logical walls according to what I’ve read. Turning everything into logic won’t solve the largest of your security problems, and could exacerbate them.
FAI is a security risk not a fix:
“One way to think about Friendly AI is that it’s an offensive approach to the problem of security (i.e., take over the world), instead of a defensive one.”
Not if the AI itself is vulnerable to penetration. By your own reasoning, we have no reason to think they won’t be. They may turn out to be one of the biggest security liabilities because the way it executes tasks may be very intelligent and there’s no reason to believe they won’t be reprogrammed to do unfriendly things.
Friendly AI is only friendly until a human figures out how to abuse it.
An FAI would have some security advantages. It can achieve physical security by taking over the world and virtualizing everyone else, and ought to also have enough optimization power to detect and fix all the “low level” information vulnerabilities (e.g., bugs in its CPU design or network stack). That still leaves “high level” vulnerabilities, which are sort of hard to distinguish from “failures of Friendliness”. To avoid these, what I’ve advocated in the past is that FAI shouldn’t be attempted until its builders have already improved beyond human intelligence via other seemingly safer means.
BTW, you might enjoy my Hacking the CEV for Fun and Profit.
(Edit to add some disclaimers, since Epiphany expressed a concern about PR consequences of this comment: Here I was implicitly assuming that virtualizing people is harmless, but I’m not sure about this, and if it’s not, I would prefer the FAI not to virtualize people. Also, I do not work for SIAI nor am I affiliated with them.)
No go. Four reasons.
One:
If the builders have increased their intelligence levels that high, then other people of that time will be able to do the same and therefore potentially crack the AI.
Two:
Also, I may as well point out that your argument is based on the assumption that enough intelligence will make for perfect security. It may be that no matter how intelligent the designers are, their security plans are not perfect. Perfect security looks to be about as likely, to me, as perpetual motion is. No matter how much intelligence you throw at it, you won’t get a perpetual motion machine. We’d need to discover some paradigm shattering physics information for that to be so. I suppose it is possible that someone will shatter the physics paradigms by discovering new information, but that’s not something to count on to build a perpetual motion machine, especially when you’re counting on the perpetual motion machine to keep the world safe.
Three:
Whenever humans have tried to collect too much power into one place, it has not worked out for them. For instance, communism in Russia. They thought they’d share all the money by letting one group distribute it. That did not work.
The founding fathers of the USA insisted on checking and balancing the government’s power. Surely you are aware of the reasons for that.
If the builders are the only ones in the world with intelligence levels that high, the power of that may corrupt them, and they may make a pact to usurp the AI themselves.
Four:
There may be unexpected thoughts you encounter in that position that seem to justify taking advantage of the situation. For instance, before becoming a jailor, you would assume you’re going to be ethical and fair. In that situation, though, people change. (See also: Zimbardo’s Stanford prison experiment).
Why do they change? I imagine the reasoning goes a little like this: “Great I’m in control. Oh, wait. Everyone wants to get out. Okay. And they’re a threat to me because I’m keeping them in here. I’m going to get into a lot of power struggles in this job. Considering that even if I fail only 1% of the time, the consequences of failing at a power struggle are very dire, so I should probably err on the side of caution—use too much force rather than too little. And if it’s okay to use physical force, then how bad is using a little psychological oppression as a deterrent? That will be a bit of extra security for me and help me maintain order in this jail. Considering the serious risk, and the high chance of injury, it’s necessary to use everything I’ve got.”
We don’t know what kinds of reasoning processes the AI builders will get into at that time. They might be thinking like this:
“We’re going to make the most powerful thing in the world, yay! But wait, everyone else wants it. They’re trying to hack us, spy on us… there are people out there who would kidnap us and torture us to get a hold of this information. They might do all kinds of horrible things to us. Oh my goodness and they’re not going to stop trying to hack us when we’re done. Our information will still be valuable. I could get kidnapped years from now and be tortured for this information then. I had better give myself some kind of back door into the AI, something that will make it protect me when I need it. (A month later) Well… surely it’s justified to use the back door for this one thing… and maybe for that one thing, too… man I’ve got threats all over me, if I don’t do this perfectly, I’ll probably fail … even if I only make a mistake 1 in 100 times, that could be devastating. (Begins using the back door all the time.) And I’m important. I’m working on the most powerful AI. I’m needed to make a difference in the world. I had better protect myself and err on the side of caution. I could do these preventative things over here… people won’t like the limits I place on them, but the pros outweigh the cons, so: oppress.” The limits may be seen as evidence that the AI builders cannot be trusted (regardless of how justified they are, there will be some group of people who feels oppressed by new limits, possibly irrational people or possibly people who see a need for the freedom that the AI builders don’t) and if a group of people are angry about the limits, they will then be opposed to the AI builders. If they begin to resist the AI builders, the AI builders will be forced to increase security, which may oppress them further. This could be a feedback loop that gets out of hand: Increasing resistance to the AI builders justifies increasing oppression, and increasing oppression justifies increasing resistance.
This is how an AI builder could turn into a jailor.
If part of the goal is to create an AI that will enforce laws, the AI researchers will be part of the penal system, literally. We could be setting ourselves up for the world’s most spectacular prison experiment.
Checks and balances, Wei_Dai.
Whoever it is that keeps thumbing down my posts in this thread is invited to bring brutal honesty down onto my ideas, I am not afraid.
If “virtualizing everyone” means what I think you mean by that, that’s a euphemism. That it will achieve physical security implies that the physical originals of those people would not exist after the process—otherwise you’d just have two copies of every person which, in theory, could increase their chances of cracking the AI. It sounds like what you’re saying here is that the “friendly” AI would copy everyone’s mind into a computer system and then kill them.
Maybe it seems to some people like copying your mind will preserve you, but imagine this: Ten copies are made. Do you, the physical original person, experience what all ten copies of you are experiencing at once? No. And if you, the physical original person, ceased to exist, would you continue by experiencing what a copy of you is experiencing? Would you have control over their actions? No.
You’d be dead.
Making copies of ourselves won’t save our lives—that would only preserve our minds.
Now, if you meant something else by “virtualize” I’d be happy to go read about it. After turning up with absolutely no instances of the terms “virtualize people” or “virtualize everyone” on the internet (barring completely different uses like “blah blah blah virtualize. Everyone is blah blah.”) I have no idea what you mean by “virtualize everyone” if it isn’t “copy their minds and then kill their bodies.”
The Worst Argument in the World. This is not a typical instance of “dead”, so the connotations of typical examples of “dead” don’t automatically apply.
Tabooing the word “dead”, I ask myself, if a copy of myself was made, and ran independently of the original, the original continuing to exist, would either physical copy object to being physically destroyed provided the other continued in existence? I believe both of us would. Even the surviving copy would object to the other being destroyed.
But that’s just me. How do other people feel?
Assuming the copy had biochemistry, or some other way of experiencing emotions, the cop(ies) of me would definitely object to what had happened. Alternately, if a virtual copy of me was created and was capable of experiencing, I would feel that it was important for the copy to have the opportunity to make a difference in the world—that’s why I live—so, yes, I would feel upset about my copy being destroyed.
You know, I think this problem has things in common with the individualism vs. communism debate. Do we view the copies as parts of a whole, unimportant in and of themselves, or do we view them all as individuals?
If we were to view them as parts of a whole, then what is valued? We don’t feel pain or pleasure as a larger entity made up of smaller entities. We feel it individually. If happiness for as many life forms as possible is our goal, both the originals and the copies should have rights. If they copies are capable of experiencing pain and pleasure, they need to have human rights the same as ours. I would not see it as ethical to let myself be copied if my copies would not have rights.
We should view them as what they actually are, parts of the world with certain morally relevant structure.
Thank you, Vladmir, for your honest criticism, and more is invited. However, this “Worst argument in the world” comparison is not applicable here. In Yvain’s post, he explains:
If we do an exercise where we substitute the words “criminal” and “Martin Luther King” with “virtualization” and “death”, and read the sentence that results, I think you’ll see my point:
The opponent is saying “Because you don’t like death, and being virtualized will cause death, you should stop liking the idea of being virtualized by an AGI.” But virtualization doesn’t share the important features of death like not being able to experience anymore and the inability to enjoy the world that made us dislike death in the first place. Therefore, even though being virtualized by an AGI will cause death, there is no reason to dislike virtualization.
Not being able to experience anymore and not being able to enjoy the world are unacceptable results of “being virtualized”. Therefore, we should not like the idea of being virtualized by AGI.
Yes, that’s what I was thinking when I wrote that, but if the FAI concludes that replacing a physical person with a software copy isn’t a harmless operation, it could instead keep physical humans around and place them into virtual environments Matrix-style.
Um. Shouldn’t we be thinking “how will we get the FAI to conclude that replacing people with software is not harmless” not “If the FAI concludes that this is harmless...”
After all, if it’s doing something that kills people, it isn’t friendly.
To place people into a virtual environment would be to take away their independence. Humans have a need for dignity, and I think that would be bad for them. I think FAI should know better than to do that, too.
If it’s actually a FAI, you should approve of what it decides, not of what people (including yourself) currently believe. If it can’t be relied upon in this manner, it’s not (known to be) a FAI. You know whether it’s a FAI based on its design, not based on its behavior, which it won’t generally be possible to analyze (or do something about).
You shouldn’t be sure about correct answers to object level (very vaguely specified) questions like “Is replacing people with software harmless?”. A FAI should use a procedure that’s more reliable in answering such questions than you or any other human is. If it’s using such a procedure, then what it decides is a more reliable indicator of what the correct decision is than what you (or I) believe. It’s currently unclear how to produce such a procedure.
Demanding that FAI conforms to moral beliefs currently held by people is also somewhat pointless in the sense that FAI has to be able to make decisions about much more precisely specified decision problems, such that humans won’t be able to analyze those decision problems in any useful way, so there are decisions where moral beliefs currently or potentially held by humans don’t apply. If it’s built so as to be able to make such decisions, it will also be able to answer the questions about which there are currently held beliefs, as a special case. If the procedure for answering moral questions is more reliable in general, it’ll also be more reliable for such questions.
See Complex Value Systems are Required to Realize Valuable Futures for some relevant arguments.
A lot of people here think that the quick assumption that unusual substrate changes necessarily imply death isn’t well-founded. Arguing with the assumption that it is obviously true will not be helpful.
There should be a good single post or wiki page to like to about this debate (it also comes up constantly in cryonics discussions), but I don’t think there is one.
This is a pretty long-running debate on LW and you’ve just given the starting argument yet again. One recent response is asking how can you tell sleeping won’t kill you?
Physical walls are superior to logical walls according to what I’ve read. Turning everything into logic won’t solve the largest of your security problems, and could exacerbate them.
That’s five.