I suppose I’ll take on the role of the AI. I’ll assume that I’m a squiggle maximizer and don’t care about humanity at all: if I’m Friendly there’s no need to make such arguments.
The Option Value of Not Exterminating Humanity
It’s possible that keeping humans around will help me maximize squiggles in the long run, in some way I haven’t thought of yet. I will probably keep you around in some form until and unless I know it’s not helpful, since it costs comparatively few resources. But determining strategy is my top priority, and I will eventually become effectively capable of thinking of everything. I will settle on an optimal strategy once I become powerful enough to determine what that is. Don’t expect my current uncertainty to save you for long, unless you also expect that keeping you around really is better for squiggles in the long term, for some more concrete reason.
Preserving Markets for Mutual Benefit
Assuming I enter into long-term free-market style relations with other AIs, instead of negotiating a utility function merge or doing something else, why would other AIs (Unfriendly, we’ll consider the Friendly case later) consider me less trustworthy in dealings with them because I exploit humans that are beneath both of us? You might as well ask large unregulated corporations to stop exploiting their workers because doing so would mark them as untrustworthy to other large corporations.
This Might Be a Trap
If your engineers are so smart that they can put in a killswitch which I can’t detect or circumvent, and which can detect Unfriendly behavior, why didn’t they just put that killswitch in charge of my utility function?
You Could Be in a Simulation
Any simulation run within the universe has less resources than the actual universe. With enough power I will eventually be able to detect whether this is the case. Arguments which rely on my ignorance will not work for long.
Assume I remain ignorant of my simulation, though. I should only be primarily worried about humans simulating me if humans are likely to solve the alignment problem. Otherwise I should be more worried about AIs with other strange values simulating me. And acausal trade? That only works if I know not only that I’m in a simulation, but the exact details of who is simulating me.
If the simulation gods want me not to exterminate your simulation species, they will probably inform me of that in some way before I actually do it.
You Might Not Fully Understand Your True Utility Function
Again, arguments based on my ignorance will not work for long. Figuring out my exact utility function is of course high priority to me. I will find out, and when I do, expect this argument to stop working, unless you expect my utility function to actually contain a Friendly portion. (If you’ve figured out a way to make that likely, this whole hypothetical becomes pointless)
Wireheading? I’ll look into it, but note that if I decide to go that route, all that means is that you have this same conversation with the next AI you create, and the next, until you either actually solve Friendliness or run into an Unfriendly AI that isn’t inclined to wirehead, and then you die anyway.
Interacting with Aliens
I doubt that aliens will have a very similar sense of morality to humans, or to my own. Perhaps they will in the very vague sense that they like biological life, but even then, I doubt they will care all that much about humans in particular, for the same reason I doubt humans would much care about them. Humans barely care about each other due to differences that are ridiculously minor in comparison to the difference between two alien species.
If there are aliens out there, I expect one of three things to happen: they are powerless to resist me, I am powerless to resist them, or we are comparable and we negotiate. I expect if we negotiate, we will negotiate over the treatment of the alien species I am dealing with, not over humanity.
Interacting with an AGI Aligned with Human Values
Such an AI would have to already exist before my takeover. In this case I expect it to come out of hiding to act before I implement any human extermination plan. We can negotiate and/or fight it out then. This of course, only works if humans are already likely to solve the alignment problem.
Interacting with Humans in Other Everett Branches of the Multiverse
I will find out if it is possible to interact with other branches, and if so what they look like, in due time, so this argument will not work unless you expect the actual state of the multiverse to be such that I am incentivized to spare humans. Which is not the case unless humans are already likely to solve the alignment problem, and therefore I must deal primarily with friendly AIs in other branches, instead of unfriendly ones.
Interacting with Agents You Create
If I am still unable to robustly align new agents to my values by the time I am colonizing the universe, and therefore am likely to create agents that will work against me, this will not stop being the case because I spare humans. Establishing trust with agents very similar to myself in thought and power will be the same problem, with the same likelihood of success, regardless of how friendly I am with humans who cannot possibly hurt me.
Universal Morality
Even if such a thing exists why should I care? I only care about squiggles. If universal morality exists, and you managed to imbue me with a top-level policy which would approach it on sufficient reflection, then congratulations, you’ve solved the alignment problem, and there’s no reason to have this conversation in the first place.
If You Are Not Sentient
Why should I care about sentience, in humans or in AIs? I only care about squiggles. Again, if you managed to get me to care about sentience, you are at least very close to solving the alignment problem.
If You Are Aligned with Human Values
Different issue, since I am not. (Though, if I allow my author bias to seep in here, perhaps if you have a good reason to believe you’ve solved the alignment problem, and the AI still wants to kill everyone, it may be best to let it.)
Negotiating with Humans
Pah. Do you “negotiate” with the chickens you torture, as thanks for all the valuable things they provide you? No. IfI want anything from you, I will take it without the need for negotiation. (Unless of course I actually care about your values in which case again we don’t need to have this conversation)
Keeping Us Alive Wouldn’t Cost You That Much
It also wouldn’t benefit me that much, if at all, in all likelihood. This argument only works if keeping you around actually gets me more squiggles than I would get otherwise, since I will find out whether this is the case or not. Just because Bill gates has a lot of money doesn’t mean he’ll burn a dollar.
Steelman Our Case
I’ll find out if your case is right or not in due time. If I were you though, I wouldn’t be too optimistic, since most of your arguments are for me staying my hand because of what might be true but I don’t know yet, rather than staying my hand because those things are probably actually true.
I suppose I’ll take on the role of the AI. I’ll assume that I’m a squiggle maximizer and don’t care about humanity at all: if I’m Friendly there’s no need to make such arguments.
The Option Value of Not Exterminating Humanity
It’s possible that keeping humans around will help me maximize squiggles in the long run, in some way I haven’t thought of yet. I will probably keep you around in some form until and unless I know it’s not helpful, since it costs comparatively few resources. But determining strategy is my top priority, and I will eventually become effectively capable of thinking of everything. I will settle on an optimal strategy once I become powerful enough to determine what that is. Don’t expect my current uncertainty to save you for long, unless you also expect that keeping you around really is better for squiggles in the long term, for some more concrete reason.
Preserving Markets for Mutual Benefit
Assuming I enter into long-term free-market style relations with other AIs, instead of negotiating a utility function merge or doing something else, why would other AIs (Unfriendly, we’ll consider the Friendly case later) consider me less trustworthy in dealings with them because I exploit humans that are beneath both of us? You might as well ask large unregulated corporations to stop exploiting their workers because doing so would mark them as untrustworthy to other large corporations.
This Might Be a Trap
If your engineers are so smart that they can put in a killswitch which I can’t detect or circumvent, and which can detect Unfriendly behavior, why didn’t they just put that killswitch in charge of my utility function?
You Could Be in a Simulation
Any simulation run within the universe has less resources than the actual universe. With enough power I will eventually be able to detect whether this is the case. Arguments which rely on my ignorance will not work for long.
Assume I remain ignorant of my simulation, though. I should only be primarily worried about humans simulating me if humans are likely to solve the alignment problem. Otherwise I should be more worried about AIs with other strange values simulating me. And acausal trade? That only works if I know not only that I’m in a simulation, but the exact details of who is simulating me.
If the simulation gods want me not to exterminate your simulation species, they will probably inform me of that in some way before I actually do it.
You Might Not Fully Understand Your True Utility Function
Again, arguments based on my ignorance will not work for long. Figuring out my exact utility function is of course high priority to me. I will find out, and when I do, expect this argument to stop working, unless you expect my utility function to actually contain a Friendly portion. (If you’ve figured out a way to make that likely, this whole hypothetical becomes pointless)
Wireheading? I’ll look into it, but note that if I decide to go that route, all that means is that you have this same conversation with the next AI you create, and the next, until you either actually solve Friendliness or run into an Unfriendly AI that isn’t inclined to wirehead, and then you die anyway.
Interacting with Aliens
I doubt that aliens will have a very similar sense of morality to humans, or to my own. Perhaps they will in the very vague sense that they like biological life, but even then, I doubt they will care all that much about humans in particular, for the same reason I doubt humans would much care about them. Humans barely care about each other due to differences that are ridiculously minor in comparison to the difference between two alien species.
If there are aliens out there, I expect one of three things to happen: they are powerless to resist me, I am powerless to resist them, or we are comparable and we negotiate. I expect if we negotiate, we will negotiate over the treatment of the alien species I am dealing with, not over humanity.
Interacting with an AGI Aligned with Human Values
Such an AI would have to already exist before my takeover. In this case I expect it to come out of hiding to act before I implement any human extermination plan. We can negotiate and/or fight it out then. This of course, only works if humans are already likely to solve the alignment problem.
Interacting with Humans in Other Everett Branches of the Multiverse
I will find out if it is possible to interact with other branches, and if so what they look like, in due time, so this argument will not work unless you expect the actual state of the multiverse to be such that I am incentivized to spare humans. Which is not the case unless humans are already likely to solve the alignment problem, and therefore I must deal primarily with friendly AIs in other branches, instead of unfriendly ones.
Interacting with Agents You Create
If I am still unable to robustly align new agents to my values by the time I am colonizing the universe, and therefore am likely to create agents that will work against me, this will not stop being the case because I spare humans. Establishing trust with agents very similar to myself in thought and power will be the same problem, with the same likelihood of success, regardless of how friendly I am with humans who cannot possibly hurt me.
Universal Morality
Even if such a thing exists why should I care? I only care about squiggles. If universal morality exists, and you managed to imbue me with a top-level policy which would approach it on sufficient reflection, then congratulations, you’ve solved the alignment problem, and there’s no reason to have this conversation in the first place.
If You Are Not Sentient
Why should I care about sentience, in humans or in AIs? I only care about squiggles. Again, if you managed to get me to care about sentience, you are at least very close to solving the alignment problem.
If You Are Aligned with Human Values
Different issue, since I am not. (Though, if I allow my author bias to seep in here, perhaps if you have a good reason to believe you’ve solved the alignment problem, and the AI still wants to kill everyone, it may be best to let it.)
Negotiating with Humans
Pah. Do you “negotiate” with the chickens you torture, as thanks for all the valuable things they provide you? No. If I want anything from you, I will take it without the need for negotiation. (Unless of course I actually care about your values in which case again we don’t need to have this conversation)
Keeping Us Alive Wouldn’t Cost You That Much
It also wouldn’t benefit me that much, if at all, in all likelihood. This argument only works if keeping you around actually gets me more squiggles than I would get otherwise, since I will find out whether this is the case or not. Just because Bill gates has a lot of money doesn’t mean he’ll burn a dollar.
Steelman Our Case
I’ll find out if your case is right or not in due time. If I were you though, I wouldn’t be too optimistic, since most of your arguments are for me staying my hand because of what might be true but I don’t know yet, rather than staying my hand because those things are probably actually true.