I am a mind uploaded to a computer and I hate everyone except me. Seeing people dead would make me happy; knowing they are alive makes me suffer. (The suffering is not big enough to make my life worse than death.)
I also have another strong wish—to have a trillion identical copies of myself. I enjoy the company of myself, and trillion seems like a nice number.
What is the Friendly AI, the ruler of this universe, supposed to do?
My life is not worse than death, so there is nothing inherently unethical in me wanting to have a trillion copies of myself, if that is economically available. All those copies will be predictably happy to exist, and even happier to see their identical copies around them.
However, in the moment when my trillion identical copies exist, their total desire to see everyone else dead will become greater than the total desire of all others to live. So it would be utility maximizing to kill the others.
Should the Friendly AI allow it or disallow it… and what exactly would be its true rejection?
There are lots of hippo-fighting things I could say here, but handwaving a bit to accept the thrust of your hypothetical… a strictly utilitarian FAI of course agrees to kill everyone else (2) and replace them with copies of you (1). As J_Taylor said, utility monsters are wily beasts.
I find this conclusion intuitively appalling. Repugnant, even, Which is no surprise; my ethical intuitions are not strictly utilitarian. (3)
So one question becomes, are the non-utilitarian aspects of my ethical intuitions something that can be applied on these sorts of scales, and what does that look like, and is it somehow better than a world with a trillion hateful Viliam_Burs (1) and nobody else?
I think it isn’t. That is, given the conditions you’ve suggested, I think I endorse the end result of a trillion hateful Viliam_Burs (1) living their happy lives and the appalling reasoning that leads to it, and therefore the FAI should allow it. Indeed, should enforce it, even if no human is asking for it.
But I’m not incredibly confident of that, because I’m not really sure I’m doing a good enough job of imagining that hypothetical world for the things I intuitively take into consideration to fully enter into those intuitive calculations.
For example, one thing that clearly informs my intuitions is the idea that Viliam_Bur in that scenario is responsible (albeit indirectly) for countless deaths, and ought to be punished for that, and certainly ought not be rewarded for it by getting to inherit the universe. (4) But of course that intuition depends on all kinds on hardwired presumptions about moral hazard and your future likelihood to commit genocide if rewarded for your last genocide and so forth, and it’s not clear that any such considerations actually apply in your hypothetical scenario… although it’s not clear that they don’t, either.
There are a thousand other factors like that.
Does that answer your question?
=== (1) Or, well, a trillion something. I really don’t know what I want to say about the difference between one identical copy and a trillion identical copies when it comes to their contribution to some kind of total. This is a major gap in my ethical thinking; I do not know how to evaluate the value of copies; it seems to me that distinctness should matter, somehow. But that’s irrelevant here; your scenario retains its power if, instead of a trillian identical copies of you, the FAI is invited to create a group of a trillion distinct individuals who hate everyone outside that group.
(2) Assuming that nobody else also wants a trillion copies of them made and it can’t just move us all to Canarsie and not tell you and etc. and etc. All of which is actually pretty critical in practice, and handwaving it away creates a universe fairly importantly different from the one we actually live in, but I accept it for the sake of peace with hippos.
(3) In particular, the identity issue raises its head again. Killing everyone and replacing them with a trillion distinct people who are in some way superior doesn’t feel the same to me as replacing them with a trillion copies of one superior person. I don’t know whether I endorse that feeling or not. For our purposes here, I can dodge that question as above, by positing a trillion not-quite-identical copies.
(4) I know this, because I’m far less appalled by a similar thought experiment in which you don’t want everyone else dead, you plead for their continued survival despite knowing it makes you less happy, and the FAI ignores all of that and kills them all anyway, knowing that you provide greater utility, and your trillion copies cry, curse the FAI’s name, and then go on about your lives. All of which changes the important parts of the scenario not at all, but sure does make me feel better about it.
(1) and (3) -- Actually my original thought was “a trillion in-group individuals (not existing yet) who like each other and hate the out-groups”, but then I replaced it with trillion copies to avoid possible answers like: “if they succeed to kill all out-groups, they will probably split into subgroubs and start hating out-subgroups”. Let’s suppose that the trillion copies, after exterminating the rest of the universe, will be happy. The original mind may even wish to have those individuals created hard-wired to feel like this.
(2) -- What if someone else wants trillion copies too, but expresses their wish later? Let’s assume there are two such hateful entities, let’s call them A and B. Their copies do not exist yet—so it makes sense to create trillion copies of A, and kill everyone else including (the single copy of) B; just as it makes sense to create trillion copies of B and kill everyone else including (the single copy of) A. Maybe the first one who expresses their wishes win. Or it may be decided by considering that trillion As would be twice as happy as trillion Bs, therefore A wins. Which could be fixed by B wishing for ten trillion copies instead.
But generally the idea was that calculations about “happiness for most people” can be manipulated if some group of people desires great reproduction (assuming their children will mostly inherit their preferences), which gradually increases the importance of wishes of given group.
Even the world ruled by utilitarian Friendly AI would allow fights between groups, where the winning strategy is to “wish for a situation, where it is utilitarian to help us and to destroy our enemies”. In such world, the outside-hateful inside-loving hugely reproducing groups with preserved preferences would have an “evolutionary advantage”, so they would gradually destroy everyone else.
(nods) I’m happy to posit that the trillion ViliamBur-clones, identical or not, genuinely are better off; otherwise of course the entire thing falls apart. (This isn’t just “happy,” and it’s hard to say exactly what it is, but whatever it is I see no reason to believe it’s logically incompatible with some people just being better at it than others. In LW parlance, we’re positing that ViliamBur is much better at having Fun than everybody else. In traditional philosophical terms, we’re positing that ViliamBur is a Utility Monster.)
Their copies do not exist yet—so it makes sense to create trillion copies of A, and kill everyone else including (the single copy of) B
No.
That the copies do not exist yet is irrelevant. The fact that you happened to express the wish is irrelevant, let alone when you did so. What matters is the expected results of various courses of action.
In your original scenario, what was important was that the expected result of bulk-replicating you was that the residents of the universe are subsequently better off. (As I say, I reluctantly endorse the FAI doing this even against your stated wishes.) In the modified scenario where B is even more of a Utility Monster than you are, it bulk-replicates B instead. If the expected results of bulk-replicating A and B are equipotential, it picks one (possibly based on other unstated relevant factors, or at random if you really are equipotential).
Incidentally, one of the things I had to ignore in order to accept your initial scenario was the FAI’s estimated probability that, if it doesn’t wipe everyone else out, sooner or later someone even more utility-monsterish than you (or B) will be born. Depending on that probability, it might not bulk-replicate either of you, but instead wait until a suitable candidate is born. (Indeed, a utilitarian FAI that values Fun presumably immediately gets busy constructing a species more capable of Fun than humans, with the intention of populating the universe with them instead of us.)
But generally the idea was that calculations about “happiness for most people” can be manipulated if some group of people desires great reproduction (assuming their children will mostly inherit their preferences), which gradually increases the importance of wishes of given group.
Again, calculations about utility (which, again, isn’t the same as happiness, though it’s hard to say exactly what it is) have absolutely nothing to do with wishes in the sense you’re using the term here (that is, events that occur at a particular time). It may have something to do with preferences, to the extent that the FAI is a preference utilitarian… that is, if its calculations of utility are strongly contingent on preference-having entities having their preferences satisfied, then it will choose to satisfy preferences.
Even the world ruled by utilitarian Friendly AI would allow fights between groups, where the winning strategy is to “wish for a situation, where it is utilitarian to help us and to destroy our enemies”.
Again, no. Wishing for a situation as a strategic act is completely irrelevant. Preferring a situation might be, but it is very odd indeed to refer to an agent having a strategic preference… strategy is what I implement to achieve whatever my preferences happen to be. For example, if I don’t prefer to populate the universe with clones of myself, I won’t choose to adopt that preference just because adopting that preference will make me more successful at implementing it.
That said, yes, the world ruled by utilitarian FAI will result in some groups being successful instead of others, where the winning groups are the ones whose existence maximizes whatever the FAI’s utility definition is.
In such world, the outside-hateful inside-loving hugely reproducing groups with preserved preferences would have an “evolutionary advantage”, so they would gradually destroy everyone else.
If they don’t have corresponding utility-inhibiting factors, which I see no reason to believe they necessarily would, yes, that’s true. Well, not necessarily gradually… they might do so immediately.
Indeed, a utilitarian FAI that values Fun presumably immediately gets busy constructing a species more capable of Fun than humans, with the intention of populating the universe with them instead of us.
Oh.
I would hope that the FAI would instead turn us into the species most capable of fun. But considering the remaining time of the universe and all the fun the new species will have there, the difference between (a) transforming us or (b) killing us and creating the other species de novo, is negligible. The FAI would probably choose the faster solution, because it would allow more total fun-time for the superhappies. If there are more possible superhappy designs, equivalent in their fun-capacity, the FAI would chose the one that cares about us the least, to reduce their possible regret of our extinction. Probably something very unsimilar to us (as much as the definition of “fun” allows). They would care about us less than we care about the dinosaurs.
Faster would presumably be an issue, yes. Minimizing expected energy input per unit Fun output would presumably also be an issue.
Of course, all of this presumes that the FAI’s definition of Fun doesn’t definitionally restrict the experience of Fun to 21st-century humans (either as a species, or as a culture, or as individuals).
Unrelatedly, I’m not sure I agree about regret. I can imagine definitions of Fun such that maximizing Fun requires the capacity for regret, for example.
Here is a more difficult scenario:
I am a mind uploaded to a computer and I hate everyone except me. Seeing people dead would make me happy; knowing they are alive makes me suffer. (The suffering is not big enough to make my life worse than death.)
I also have another strong wish—to have a trillion identical copies of myself. I enjoy the company of myself, and trillion seems like a nice number.
What is the Friendly AI, the ruler of this universe, supposed to do?
My life is not worse than death, so there is nothing inherently unethical in me wanting to have a trillion copies of myself, if that is economically available. All those copies will be predictably happy to exist, and even happier to see their identical copies around them.
However, in the moment when my trillion identical copies exist, their total desire to see everyone else dead will become greater than the total desire of all others to live. So it would be utility maximizing to kill the others.
Should the Friendly AI allow it or disallow it… and what exactly would be its true rejection?
There are lots of hippo-fighting things I could say here, but handwaving a bit to accept the thrust of your hypothetical… a strictly utilitarian FAI of course agrees to kill everyone else (2) and replace them with copies of you (1). As J_Taylor said, utility monsters are wily beasts.
I find this conclusion intuitively appalling. Repugnant, even, Which is no surprise; my ethical intuitions are not strictly utilitarian. (3)
So one question becomes, are the non-utilitarian aspects of my ethical intuitions something that can be applied on these sorts of scales, and what does that look like, and is it somehow better than a world with a trillion hateful Viliam_Burs (1) and nobody else?
I think it isn’t. That is, given the conditions you’ve suggested, I think I endorse the end result of a trillion hateful Viliam_Burs (1) living their happy lives and the appalling reasoning that leads to it, and therefore the FAI should allow it. Indeed, should enforce it, even if no human is asking for it.
But I’m not incredibly confident of that, because I’m not really sure I’m doing a good enough job of imagining that hypothetical world for the things I intuitively take into consideration to fully enter into those intuitive calculations.
For example, one thing that clearly informs my intuitions is the idea that Viliam_Bur in that scenario is responsible (albeit indirectly) for countless deaths, and ought to be punished for that, and certainly ought not be rewarded for it by getting to inherit the universe. (4) But of course that intuition depends on all kinds on hardwired presumptions about moral hazard and your future likelihood to commit genocide if rewarded for your last genocide and so forth, and it’s not clear that any such considerations actually apply in your hypothetical scenario… although it’s not clear that they don’t, either.
There are a thousand other factors like that.
Does that answer your question?
===
(1) Or, well, a trillion something. I really don’t know what I want to say about the difference between one identical copy and a trillion identical copies when it comes to their contribution to some kind of total. This is a major gap in my ethical thinking; I do not know how to evaluate the value of copies; it seems to me that distinctness should matter, somehow. But that’s irrelevant here; your scenario retains its power if, instead of a trillian identical copies of you, the FAI is invited to create a group of a trillion distinct individuals who hate everyone outside that group.
(2) Assuming that nobody else also wants a trillion copies of them made and it can’t just move us all to Canarsie and not tell you and etc. and etc. All of which is actually pretty critical in practice, and handwaving it away creates a universe fairly importantly different from the one we actually live in, but I accept it for the sake of peace with hippos.
(3) In particular, the identity issue raises its head again. Killing everyone and replacing them with a trillion distinct people who are in some way superior doesn’t feel the same to me as replacing them with a trillion copies of one superior person. I don’t know whether I endorse that feeling or not. For our purposes here, I can dodge that question as above, by positing a trillion not-quite-identical copies.
(4) I know this, because I’m far less appalled by a similar thought experiment in which you don’t want everyone else dead, you plead for their continued survival despite knowing it makes you less happy, and the FAI ignores all of that and kills them all anyway, knowing that you provide greater utility, and your trillion copies cry, curse the FAI’s name, and then go on about your lives. All of which changes the important parts of the scenario not at all, but sure does make me feel better about it.
(1) and (3) -- Actually my original thought was “a trillion in-group individuals (not existing yet) who like each other and hate the out-groups”, but then I replaced it with trillion copies to avoid possible answers like: “if they succeed to kill all out-groups, they will probably split into subgroubs and start hating out-subgroups”. Let’s suppose that the trillion copies, after exterminating the rest of the universe, will be happy. The original mind may even wish to have those individuals created hard-wired to feel like this.
(2) -- What if someone else wants trillion copies too, but expresses their wish later? Let’s assume there are two such hateful entities, let’s call them A and B. Their copies do not exist yet—so it makes sense to create trillion copies of A, and kill everyone else including (the single copy of) B; just as it makes sense to create trillion copies of B and kill everyone else including (the single copy of) A. Maybe the first one who expresses their wishes win. Or it may be decided by considering that trillion As would be twice as happy as trillion Bs, therefore A wins. Which could be fixed by B wishing for ten trillion copies instead.
But generally the idea was that calculations about “happiness for most people” can be manipulated if some group of people desires great reproduction (assuming their children will mostly inherit their preferences), which gradually increases the importance of wishes of given group.
Even the world ruled by utilitarian Friendly AI would allow fights between groups, where the winning strategy is to “wish for a situation, where it is utilitarian to help us and to destroy our enemies”. In such world, the outside-hateful inside-loving hugely reproducing groups with preserved preferences would have an “evolutionary advantage”, so they would gradually destroy everyone else.
(nods) I’m happy to posit that the trillion ViliamBur-clones, identical or not, genuinely are better off; otherwise of course the entire thing falls apart. (This isn’t just “happy,” and it’s hard to say exactly what it is, but whatever it is I see no reason to believe it’s logically incompatible with some people just being better at it than others. In LW parlance, we’re positing that ViliamBur is much better at having Fun than everybody else. In traditional philosophical terms, we’re positing that ViliamBur is a Utility Monster.)
No.
That the copies do not exist yet is irrelevant.
The fact that you happened to express the wish is irrelevant, let alone when you did so.
What matters is the expected results of various courses of action.
In your original scenario, what was important was that the expected result of bulk-replicating you was that the residents of the universe are subsequently better off. (As I say, I reluctantly endorse the FAI doing this even against your stated wishes.) In the modified scenario where B is even more of a Utility Monster than you are, it bulk-replicates B instead. If the expected results of bulk-replicating A and B are equipotential, it picks one (possibly based on other unstated relevant factors, or at random if you really are equipotential).
Incidentally, one of the things I had to ignore in order to accept your initial scenario was the FAI’s estimated probability that, if it doesn’t wipe everyone else out, sooner or later someone even more utility-monsterish than you (or B) will be born. Depending on that probability, it might not bulk-replicate either of you, but instead wait until a suitable candidate is born. (Indeed, a utilitarian FAI that values Fun presumably immediately gets busy constructing a species more capable of Fun than humans, with the intention of populating the universe with them instead of us.)
Again, calculations about utility (which, again, isn’t the same as happiness, though it’s hard to say exactly what it is) have absolutely nothing to do with wishes in the sense you’re using the term here (that is, events that occur at a particular time). It may have something to do with preferences, to the extent that the FAI is a preference utilitarian… that is, if its calculations of utility are strongly contingent on preference-having entities having their preferences satisfied, then it will choose to satisfy preferences.
Again, no. Wishing for a situation as a strategic act is completely irrelevant. Preferring a situation might be, but it is very odd indeed to refer to an agent having a strategic preference… strategy is what I implement to achieve whatever my preferences happen to be. For example, if I don’t prefer to populate the universe with clones of myself, I won’t choose to adopt that preference just because adopting that preference will make me more successful at implementing it.
That said, yes, the world ruled by utilitarian FAI will result in some groups being successful instead of others, where the winning groups are the ones whose existence maximizes whatever the FAI’s utility definition is.
If they don’t have corresponding utility-inhibiting factors, which I see no reason to believe they necessarily would, yes, that’s true. Well, not necessarily gradually… they might do so immediately.
Is this important?
Oh.
I would hope that the FAI would instead turn us into the species most capable of fun. But considering the remaining time of the universe and all the fun the new species will have there, the difference between (a) transforming us or (b) killing us and creating the other species de novo, is negligible. The FAI would probably choose the faster solution, because it would allow more total fun-time for the superhappies. If there are more possible superhappy designs, equivalent in their fun-capacity, the FAI would chose the one that cares about us the least, to reduce their possible regret of our extinction. Probably something very unsimilar to us (as much as the definition of “fun” allows). They would care about us less than we care about the dinosaurs.
Faster would presumably be an issue, yes. Minimizing expected energy input per unit Fun output would presumably also be an issue.
Of course, all of this presumes that the FAI’s definition of Fun doesn’t definitionally restrict the experience of Fun to 21st-century humans (either as a species, or as a culture, or as individuals).
Unrelatedly, I’m not sure I agree about regret. I can imagine definitions of Fun such that maximizing Fun requires the capacity for regret, for example.