“I value both saving orphans from fires and eating chocolate. I’m a horrible person, so I can’t choose whether to abandon my chocolate and save the orphanage.”
Should I self-modify to ignore the orphans? Hell no. If future-me doesn’t want to save orphans then he never will, even if it would cost no chocolate.
That’s a very big counterfactual hypothesis, that there exists someone who holds equal moral weight to the statements ‘I am saving orphans from fires’ and ‘I am eating chocolate’. It would certainly show a lack of empathy—or a near self-destructive need for chocolate! In fact, the best choice for someone (if it would still be ‘human’) with those qualities in our society would be to keep the desire to save orphans, so as to retain a modicum of humanity. The only reason I suggest it would want such a modicum, would be so as to survive in the human society it finds itself (assuming wishes to stay alive, so as to continue fulfilling desires).
Of course, this whole counter-example assumes that the two desires are equally desired, and at odds. Which is quite difficult even to imagine.
But I still think that the earlier idea, that there would be no universal moral standard against which it could compare its decision, remains. It is certainly wrong, and evil to choose the chocolate from my point of view, but I am, alas, only human.
And, I will do everything in my power to encourage the sorts of behaviour that makes agents prefer to save orphans from fires, than to eat chocolate!!!
Hey, it doesn’t have to be orphans. Or it could be two different kinds of orphan—boys and girls, say. The boy’s orphanage is on fire! So is the nearby girl’s orphanage! Which one do you save!
Protip: The correct response is not “I self-modify to only care about one sex.”
EDIT: Also, aren’t you kind of fighting the counterfactual?
I was just talking about sets of desires that clash in principle. When you have to desires that clash over one thing, then you will act to fulfill the stronger of your desires. But, as I’ve tried to make clear, if one desire is to ‘kill all humans’ and another is ‘to save all humans’ then the best idea is to (attempt to) self-modify to have only the desire that will produce the most utility. Having both will mean disutility always.
I’m sorry, I don’t understand what you mean when you say ‘fighting the counterfactual’.
But, as I’ve tried to make clear, if one desire is to ‘kill all humans’ and another is ‘to save all humans’
...then you have a conflict. The best idea is not to cut off one of those desires, but to find out where the conflict comes from and what higher goals are giving rise to these as instrumental subgoals.
But how do you know something is a terminal value? They don’t come conveniently labelled. Someone else just claimed that not killing people is a terminal value for all “neurotypical” people, but unless they’re going to define every soldier, everyone exonerated at an inquest by reason of self defence, and every doctor who has acceded to a terminal patient’s desire for an easy exit, as non-”neurotypical”, “not killing people” bears about as much resemblance to a terminal value as a D&D character sheet does to an actual person.
I’m sorry, I don’t understand what you mean when you say ‘fighting the counterfactual’.
Try the search bar. It’s a pretty common concept here, although I don’t recall where it originated.
I was just talking about sets of desires that clash in principle. When you have to desires that clash over one thing, then you will act to fulfill the stronger of your desires. But, as I’ve tried to make clear, if one desire is to ‘kill all humans’ and another is ‘to save all humans’ then the best idea is to (attempt to) self-modify to have only the desire that will produce the most utility. Having both will mean disutility always.
Well, that disutility is only lower according to my new preferences; my old one’s remain sadly unfulfilled.
More specifically, if I value both freedom and safety (for everyone), should I self-modify not to hate reprogramming others? Or not to care that people will decide to kill each other sometimes?
Hmm… I don’t think my point necessarily helps here. I meant that you will always get disutility when you have two desires that always clash (x and not x); whichever way you choose, the other desire won’t be fulfilled.
However, in the case you offered (and probably most cases) it’s not a good idea to self-modify, as desires don’t clash in principle, always. Like with the chocolate and saving kids one, you just have to perform utility calculations to see which way to go (that one is saving kids).
you will always get disutility when you have two desires that always clash (x and not x); whichever way you choose, the other desire won’t be fulfilled.
Yup. And if you stop caring about one of those values, then modified!you will be happier. But you don’t care about what modified!you wants, you care about x and not-x.
“I value both saving orphans from fires and eating chocolate. I’m a horrible person, so I can’t choose whether to abandon my chocolate and save the orphanage.”
Should I self-modify to ignore the orphans? Hell no. If future-me doesn’t want to save orphans then he never will, even if it would cost no chocolate.
That’s a very big counterfactual hypothesis, that there exists someone who holds equal moral weight to the statements ‘I am saving orphans from fires’ and ‘I am eating chocolate’. It would certainly show a lack of empathy—or a near self-destructive need for chocolate! In fact, the best choice for someone (if it would still be ‘human’) with those qualities in our society would be to keep the desire to save orphans, so as to retain a modicum of humanity. The only reason I suggest it would want such a modicum, would be so as to survive in the human society it finds itself (assuming wishes to stay alive, so as to continue fulfilling desires). Of course, this whole counter-example assumes that the two desires are equally desired, and at odds. Which is quite difficult even to imagine. But I still think that the earlier idea, that there would be no universal moral standard against which it could compare its decision, remains. It is certainly wrong, and evil to choose the chocolate from my point of view, but I am, alas, only human. And, I will do everything in my power to encourage the sorts of behaviour that makes agents prefer to save orphans from fires, than to eat chocolate!!!
Hey, it doesn’t have to be orphans. Or it could be two different kinds of orphan—boys and girls, say. The boy’s orphanage is on fire! So is the nearby girl’s orphanage! Which one do you save!
Protip: The correct response is not “I self-modify to only care about one sex.”
EDIT: Also, aren’t you kind of fighting the counterfactual?
I was just talking about sets of desires that clash in principle. When you have to desires that clash over one thing, then you will act to fulfill the stronger of your desires. But, as I’ve tried to make clear, if one desire is to ‘kill all humans’ and another is ‘to save all humans’ then the best idea is to (attempt to) self-modify to have only the desire that will produce the most utility. Having both will mean disutility always.
I’m sorry, I don’t understand what you mean when you say ‘fighting the counterfactual’.
“Fighting the counterfactual” presumably means “fighting the hypo[thetical]”.
Thanks.
...then you have a conflict. The best idea is not to cut off one of those desires, but to find out where the conflict comes from and what higher goals are giving rise to these as instrumental subgoals.
If you can’t, then:
You have failed.
Sucks to be you.
If you’re screwed enough, you’re screwed.
(For then record, I meant terminal values.)
But how do you know something is a terminal value? They don’t come conveniently labelled. Someone else just claimed that not killing people is a terminal value for all “neurotypical” people, but unless they’re going to define every soldier, everyone exonerated at an inquest by reason of self defence, and every doctor who has acceded to a terminal patient’s desire for an easy exit, as non-”neurotypical”, “not killing people” bears about as much resemblance to a terminal value as a D&D character sheet does to an actual person.
I was oversimplifying things. Updated now, thanks.
Try the search bar. It’s a pretty common concept here, although I don’t recall where it originated.
Well, that disutility is only lower according to my new preferences; my old one’s remain sadly unfulfilled.
More specifically, if I value both freedom and safety (for everyone), should I self-modify not to hate reprogramming others? Or not to care that people will decide to kill each other sometimes?
Hmm… I don’t think my point necessarily helps here. I meant that you will always get disutility when you have two desires that always clash (x and not x); whichever way you choose, the other desire won’t be fulfilled.
However, in the case you offered (and probably most cases) it’s not a good idea to self-modify, as desires don’t clash in principle, always. Like with the chocolate and saving kids one, you just have to perform utility calculations to see which way to go (that one is saving kids).
Yup. And if you stop caring about one of those values, then modified!you will be happier. But you don’t care about what modified!you wants, you care about x and not-x.