If one a single agent has conflicting desires (each of which it values equally) then it should work to alter its desires, so it chooses consistent desires that are most likely to be fulfilled.
To your latter question though, I think that what you’re asking is “If two agents have utility functions that clash, which one is to be preferred?”
Is it that all we can say is “Whichever one has the most resources and most optimisation power/intelligence will be able to put its goals into action and prevent the other one from fully acting upon its”?
Well, I think that the point Eliezer has talked about a few times before is that there is no ultimate morality, written into the universe that will affect any agent so as to act it out. You can’t reason with an agent which has a totally different utility function. The only reason that we can argue with humans is that they’re only human, and thus we share many desires. Figuring out morality isn’t going to give you the powers to talk down Clippy from killing you for more paper clips. You aren’t going to show how human ‘morality’, which actualises what humans prefer, is any more preferable than ‘Clippy’ ethics. He is just going to kill you.
So, let’s now figure out exactly what we want most, (if we had our own CEV) and then go out and do it.
Nobody else is gonna do it for us.
EDIT: First sentence ‘conflicting desires’; I meant to say ‘in principle unresolvable’ like ‘x’ and ‘~x’. Of course, for most situations, you have multiple desires that clash, and you just have to perform utility calculations to figure out what to do.
You can’t reason with an agent which has a totally different utility function. The only reason that we can argue with humans is that they’re only human, and thus we share many desires.
If you know (or correctly guess) the agents’ utility function, and are able to communicate with it, then it may well be possible to reason with it.
Consider this situation; I am captured by a Paperclipper, which wishes to extract the iron from my blood and use it to make more paperclips (incidentally killing me in the process). I can attempt to escape by promising to send to the Paperclipper a quantity of iron—substantially more than can be found in my blood, and easier to extract—as soon as I am safe. As long as I can convince Clippy that I will follow through on my promise, I have a chance of living.
I can’t talk Clippy into adopting my own morality. But I can talk Clippy into performing individual actions that I would prefer Clippy to do (or into refraining from other actions) as long as I ensure that Clippy can get more paperclips by doing what I ask than by not doing what I ask.
Of course—my mistake. I meant that you can’t alter an agent’s desires by reason alone. You can’t appeal to desires you have. You can only appeal to its desires. So, when he’s going to turn the your blood iron into paperclips, and you want to live, you can’t try “But I want to live a long and happy life!”. If Clippy hasn’t got empathy, and you have nothing to offer that will help fulfill his own desires, then there’s nothing to be done, other than try to physical stop or kill him.
Maybe you’d be happier if you put him in a planet of his own, where a machine constantly destroye paperclips, and he was happy making new ones. My point is just that, if you do decide to make him happy, it’s not the optimal decision relative to a universal preference, or morality. It’s just the optimal decision relative to your desires. Is that ‘right’? Yes. That’s what we refer to, when we say ‘right’.
If one a single agent has conflicting desires (each of which it values equally) then it should work to alter its desires, so it chooses consistent desires that are most likely to be fulfilled.
Hahaha no. If it doesn’t desire these other desires, then they are less likely to be fulfilled.
Figuring out morality isn’t going to give you the powers to talk down Clippy from killing you for more paper clips. You aren’t going to show how human ‘morality’, which actualises what humans prefer, is any more preferable than ‘Clippy’ ethics. He is just going to kill you.
Well, if you could persuade him our morality is “better” by his standards—results in more paperclips—than it could work. But obviously arguing that Murder Is Wrong is about as smart as them telling you that killing it would be Wrong because it results in less paperclips.
So, let’s now figure out exactly what we want most, (if we had our own CEV) and then go out and do it. Nobody else is gonna do it for us.
Indeed. (Although “us” here includes an FAI, obviously.)
If one a single agent has conflicting desires (each of which it values equally) then it should work to alter its desires, so it chooses consistent desires that are most likely to be fulfilled.
Hahaha no. If it doesn’t desire these other desires, then they are less likely to be fulfilled.
I don’t understand… I said it has two equally valued desires? So, it doesn’t desire one over the other. So, if it desired x, y and z, equally well, except that x --> <~y v ~z>, but y or z ( or both ) implied just ~x, then even though it desires x, it would be optimal to alter its desires, so as to not desire x. Then, it will always be happy fulfilling y and z, and not continue to be dissatisfied.
I was saying this in response to dspeyer saying he had two axiomations of morality (I took that to mean two desires, or sets of) which were in conflict. I was saying that there is no universal maxim against which he could measure the two—he just needs to figure out which ones will be optimal in the long term, and (attempt to) discard the rest.
Edit: Oh, I now realise I originally added the word ‘one’ to the first sentence of the earlier post you were quoting. If this was somehow the cause of confusion, my apologies.
“I value both saving orphans from fires and eating chocolate. I’m a horrible person, so I can’t choose whether to abandon my chocolate and save the orphanage.”
Should I self-modify to ignore the orphans? Hell no. If future-me doesn’t want to save orphans then he never will, even if it would cost no chocolate.
That’s a very big counterfactual hypothesis, that there exists someone who holds equal moral weight to the statements ‘I am saving orphans from fires’ and ‘I am eating chocolate’. It would certainly show a lack of empathy—or a near self-destructive need for chocolate! In fact, the best choice for someone (if it would still be ‘human’) with those qualities in our society would be to keep the desire to save orphans, so as to retain a modicum of humanity. The only reason I suggest it would want such a modicum, would be so as to survive in the human society it finds itself (assuming wishes to stay alive, so as to continue fulfilling desires).
Of course, this whole counter-example assumes that the two desires are equally desired, and at odds. Which is quite difficult even to imagine.
But I still think that the earlier idea, that there would be no universal moral standard against which it could compare its decision, remains. It is certainly wrong, and evil to choose the chocolate from my point of view, but I am, alas, only human.
And, I will do everything in my power to encourage the sorts of behaviour that makes agents prefer to save orphans from fires, than to eat chocolate!!!
Hey, it doesn’t have to be orphans. Or it could be two different kinds of orphan—boys and girls, say. The boy’s orphanage is on fire! So is the nearby girl’s orphanage! Which one do you save!
Protip: The correct response is not “I self-modify to only care about one sex.”
EDIT: Also, aren’t you kind of fighting the counterfactual?
I was just talking about sets of desires that clash in principle. When you have to desires that clash over one thing, then you will act to fulfill the stronger of your desires. But, as I’ve tried to make clear, if one desire is to ‘kill all humans’ and another is ‘to save all humans’ then the best idea is to (attempt to) self-modify to have only the desire that will produce the most utility. Having both will mean disutility always.
I’m sorry, I don’t understand what you mean when you say ‘fighting the counterfactual’.
But, as I’ve tried to make clear, if one desire is to ‘kill all humans’ and another is ‘to save all humans’
...then you have a conflict. The best idea is not to cut off one of those desires, but to find out where the conflict comes from and what higher goals are giving rise to these as instrumental subgoals.
But how do you know something is a terminal value? They don’t come conveniently labelled. Someone else just claimed that not killing people is a terminal value for all “neurotypical” people, but unless they’re going to define every soldier, everyone exonerated at an inquest by reason of self defence, and every doctor who has acceded to a terminal patient’s desire for an easy exit, as non-”neurotypical”, “not killing people” bears about as much resemblance to a terminal value as a D&D character sheet does to an actual person.
I’m sorry, I don’t understand what you mean when you say ‘fighting the counterfactual’.
Try the search bar. It’s a pretty common concept here, although I don’t recall where it originated.
I was just talking about sets of desires that clash in principle. When you have to desires that clash over one thing, then you will act to fulfill the stronger of your desires. But, as I’ve tried to make clear, if one desire is to ‘kill all humans’ and another is ‘to save all humans’ then the best idea is to (attempt to) self-modify to have only the desire that will produce the most utility. Having both will mean disutility always.
Well, that disutility is only lower according to my new preferences; my old one’s remain sadly unfulfilled.
More specifically, if I value both freedom and safety (for everyone), should I self-modify not to hate reprogramming others? Or not to care that people will decide to kill each other sometimes?
Hmm… I don’t think my point necessarily helps here. I meant that you will always get disutility when you have two desires that always clash (x and not x); whichever way you choose, the other desire won’t be fulfilled.
However, in the case you offered (and probably most cases) it’s not a good idea to self-modify, as desires don’t clash in principle, always. Like with the chocolate and saving kids one, you just have to perform utility calculations to see which way to go (that one is saving kids).
you will always get disutility when you have two desires that always clash (x and not x); whichever way you choose, the other desire won’t be fulfilled.
Yup. And if you stop caring about one of those values, then modified!you will be happier. But you don’t care about what modified!you wants, you care about x and not-x.
If one a single agent has conflicting desires (each of which it values equally) then it should work to alter its desires, so it chooses consistent desires that are most likely to be fulfilled.
To your latter question though, I think that what you’re asking is “If two agents have utility functions that clash, which one is to be preferred?” Is it that all we can say is “Whichever one has the most resources and most optimisation power/intelligence will be able to put its goals into action and prevent the other one from fully acting upon its”?
Well, I think that the point Eliezer has talked about a few times before is that there is no ultimate morality, written into the universe that will affect any agent so as to act it out. You can’t reason with an agent which has a totally different utility function. The only reason that we can argue with humans is that they’re only human, and thus we share many desires. Figuring out morality isn’t going to give you the powers to talk down Clippy from killing you for more paper clips. You aren’t going to show how human ‘morality’, which actualises what humans prefer, is any more preferable than ‘Clippy’ ethics. He is just going to kill you.
So, let’s now figure out exactly what we want most, (if we had our own CEV) and then go out and do it. Nobody else is gonna do it for us.
EDIT: First sentence ‘conflicting desires’; I meant to say ‘in principle unresolvable’ like ‘x’ and ‘~x’. Of course, for most situations, you have multiple desires that clash, and you just have to perform utility calculations to figure out what to do.
If you know (or correctly guess) the agents’ utility function, and are able to communicate with it, then it may well be possible to reason with it.
Consider this situation; I am captured by a Paperclipper, which wishes to extract the iron from my blood and use it to make more paperclips (incidentally killing me in the process). I can attempt to escape by promising to send to the Paperclipper a quantity of iron—substantially more than can be found in my blood, and easier to extract—as soon as I am safe. As long as I can convince Clippy that I will follow through on my promise, I have a chance of living.
I can’t talk Clippy into adopting my own morality. But I can talk Clippy into performing individual actions that I would prefer Clippy to do (or into refraining from other actions) as long as I ensure that Clippy can get more paperclips by doing what I ask than by not doing what I ask.
Of course—my mistake. I meant that you can’t alter an agent’s desires by reason alone. You can’t appeal to desires you have. You can only appeal to its desires. So, when he’s going to turn the your blood iron into paperclips, and you want to live, you can’t try “But I want to live a long and happy life!”. If Clippy hasn’t got empathy, and you have nothing to offer that will help fulfill his own desires, then there’s nothing to be done, other than try to physical stop or kill him.
Maybe you’d be happier if you put him in a planet of his own, where a machine constantly destroye paperclips, and he was happy making new ones. My point is just that, if you do decide to make him happy, it’s not the optimal decision relative to a universal preference, or morality. It’s just the optimal decision relative to your desires. Is that ‘right’? Yes. That’s what we refer to, when we say ‘right’.
Hahaha no. If it doesn’t desire these other desires, then they are less likely to be fulfilled.
Well, if you could persuade him our morality is “better” by his standards—results in more paperclips—than it could work. But obviously arguing that Murder Is Wrong is about as smart as them telling you that killing it would be Wrong because it results in less paperclips.
Indeed. (Although “us” here includes an FAI, obviously.)
I don’t understand… I said it has two equally valued desires? So, it doesn’t desire one over the other. So, if it desired x, y and z, equally well, except that x --> <~y v ~z>, but y or z ( or both ) implied just ~x, then even though it desires x, it would be optimal to alter its desires, so as to not desire x. Then, it will always be happy fulfilling y and z, and not continue to be dissatisfied.
I was saying this in response to dspeyer saying he had two axiomations of morality (I took that to mean two desires, or sets of) which were in conflict. I was saying that there is no universal maxim against which he could measure the two—he just needs to figure out which ones will be optimal in the long term, and (attempt to) discard the rest.
Edit: Oh, I now realise I originally added the word ‘one’ to the first sentence of the earlier post you were quoting. If this was somehow the cause of confusion, my apologies.
“I value both saving orphans from fires and eating chocolate. I’m a horrible person, so I can’t choose whether to abandon my chocolate and save the orphanage.”
Should I self-modify to ignore the orphans? Hell no. If future-me doesn’t want to save orphans then he never will, even if it would cost no chocolate.
That’s a very big counterfactual hypothesis, that there exists someone who holds equal moral weight to the statements ‘I am saving orphans from fires’ and ‘I am eating chocolate’. It would certainly show a lack of empathy—or a near self-destructive need for chocolate! In fact, the best choice for someone (if it would still be ‘human’) with those qualities in our society would be to keep the desire to save orphans, so as to retain a modicum of humanity. The only reason I suggest it would want such a modicum, would be so as to survive in the human society it finds itself (assuming wishes to stay alive, so as to continue fulfilling desires). Of course, this whole counter-example assumes that the two desires are equally desired, and at odds. Which is quite difficult even to imagine. But I still think that the earlier idea, that there would be no universal moral standard against which it could compare its decision, remains. It is certainly wrong, and evil to choose the chocolate from my point of view, but I am, alas, only human. And, I will do everything in my power to encourage the sorts of behaviour that makes agents prefer to save orphans from fires, than to eat chocolate!!!
Hey, it doesn’t have to be orphans. Or it could be two different kinds of orphan—boys and girls, say. The boy’s orphanage is on fire! So is the nearby girl’s orphanage! Which one do you save!
Protip: The correct response is not “I self-modify to only care about one sex.”
EDIT: Also, aren’t you kind of fighting the counterfactual?
I was just talking about sets of desires that clash in principle. When you have to desires that clash over one thing, then you will act to fulfill the stronger of your desires. But, as I’ve tried to make clear, if one desire is to ‘kill all humans’ and another is ‘to save all humans’ then the best idea is to (attempt to) self-modify to have only the desire that will produce the most utility. Having both will mean disutility always.
I’m sorry, I don’t understand what you mean when you say ‘fighting the counterfactual’.
“Fighting the counterfactual” presumably means “fighting the hypo[thetical]”.
Thanks.
...then you have a conflict. The best idea is not to cut off one of those desires, but to find out where the conflict comes from and what higher goals are giving rise to these as instrumental subgoals.
If you can’t, then:
You have failed.
Sucks to be you.
If you’re screwed enough, you’re screwed.
(For then record, I meant terminal values.)
But how do you know something is a terminal value? They don’t come conveniently labelled. Someone else just claimed that not killing people is a terminal value for all “neurotypical” people, but unless they’re going to define every soldier, everyone exonerated at an inquest by reason of self defence, and every doctor who has acceded to a terminal patient’s desire for an easy exit, as non-”neurotypical”, “not killing people” bears about as much resemblance to a terminal value as a D&D character sheet does to an actual person.
I was oversimplifying things. Updated now, thanks.
Try the search bar. It’s a pretty common concept here, although I don’t recall where it originated.
Well, that disutility is only lower according to my new preferences; my old one’s remain sadly unfulfilled.
More specifically, if I value both freedom and safety (for everyone), should I self-modify not to hate reprogramming others? Or not to care that people will decide to kill each other sometimes?
Hmm… I don’t think my point necessarily helps here. I meant that you will always get disutility when you have two desires that always clash (x and not x); whichever way you choose, the other desire won’t be fulfilled.
However, in the case you offered (and probably most cases) it’s not a good idea to self-modify, as desires don’t clash in principle, always. Like with the chocolate and saving kids one, you just have to perform utility calculations to see which way to go (that one is saving kids).
Yup. And if you stop caring about one of those values, then modified!you will be happier. But you don’t care about what modified!you wants, you care about x and not-x.