Just curious: has anyone explored the idea of utility functions as vectors, and then extended this to the idea of a normalized utility function dot product? Because having thought about it for a long while, and remembering after reading a few things today, I’m utterly convinced that the happiness of some people ought to count negatively.
The dot product is just yer’ regular old integral over the domain, weighted in some (unspecified) way.
The thing is though, the average product over the whole infinite space of possibilities isn’t much use when it comes to intelligent agents. This is because only one outcome really happens, and intelligent agents will try to choose a good one, not one that’s representative of the average. If two wedding planners have opposite opinions about every type of cake except they both adore white cake with raspberry buttercream, then they’ll just have white cake with raspberry buttercream—the fact that the inner product of their cake functions is negative a bajillion doesn’t matter, they’ll both enjoy the cake.
Yeah, but Wedding Planner 1′s deep vitriolic moral hatred of the lemon chiffon cake that delights Wedding Planner 2 that abused her as a young girl or Wedding Planner 2′s thunderous personal objection to the enslavement of his family that went into making the cocoa for the devil’s food cake that Wedding Planner 1 adores could easily make them refuse to share said delicious white cake with raspberry buttercream to the point where either would very happily destroy it to prevent the other from getting any. This seems suboptimal, though.
I was rereading Eliezer’s old posts on morality, and in Leaky Generalizations ran across something pretty close to what you’re talking about:
You can say, unconditionally and flatly, that killing anyone is a huge dose of negative terminal utility. Yes, even Hitler. That doesn’t mean you shouldn’t shoot Hitler. It means that the net instrumental utility of shooting Hitler carries a giant dose of negative utility from Hitler’s death, and an hugely larger dose of positive utility from all the other lives that would be saved as a consequence.
Many commit the type error that I warned against in Terminal Values and Instrumental Values, and think that if the net consequential expected utility of Hitler’s death is conceded to be positive, then the immediate local terminal utility must also be positive, meaning that the moral principle “Death is always a bad thing” is itself a leaky generalization. But this is double counting, with utilities instead of probabilities; you’re setting up a resonance between the expected utility and the utility, instead of a one-way flow from utility to expected utility.
Or maybe it’s just the urge toward a one-sided policy debate: the best policy must have no drawbacks.
In my moral philosophy, the local negative utility of Hitler’s death is stable, no matter what happens to the external consequences and hence to the expected utility.
Of course, you can set up a moral argument that it’s an inherently a good thing to punish evil people, even with capital punishment for sufficiently evil people. But you can’t carry this moral argument by pointing out that the consequence of shooting a man with a leveled gun may be to save other lives. This is appealing to the value of life, not appealing to the value of death. If expected utilities are leaky and complicated, it doesn’t mean that utilities must be leaky and complicated as well. They might be! But it would be a separate argument.
(I recommend reading the whole thing, as well as the few previous posts on morality if you haven’t already)
I haven’t explored that idea; can you be more specific about what this idea might bring to the table?
I’m utterly convinced that the happiness of some people ought to count negatively
Are you sure? You believe there are some people for which the morally right thing to do is to inflect as much misery and suffering as you can, keeping them alive so you can torture them forever, and there is not necessarily even a benefit to yourself or anyone else to doing this?
Are you sure? You believe there are some people for which the morally right thing to do is to inflect as much misery and suffering as you can, keeping them alive so you can torture them forever, and there is not necessarily even a benefit to yourself or anyone else to doing this?
The negative utility need not be boundless or even monotonic. A coherent preference system could count a modest amount of misery experienced by people fitting certain criteria to be positive while extreme misery and torture of the same individual is evaluated negatively.
Trivially, nega-you who hates everything you like (oh, you want to put them out of their misery? Too bad they want to live now, since they don’t want what you want). But such a being would certainly not be a human.
I’m not sure why you’re both hung up on that the things hypothetical-me is interacting with need be human. Manfred: I address a similar entity in a different post. Adele_L: …and?
I’m utterly convinced that the happiness of some people ought to count negatively
In this context, ‘people’ typically refers to a being with moral weight. What we know about morality comes from our intuitions mostly, and we have an intuitive concept ‘person’ which counts in some way morally. (Not necessarily a human, sentient aliens probably count as ‘people’, perhaps even dolphins.) Defining an arbitrary being which does not correspond to this intuitive concept needs to be flagged as such, as a warning that our intuitions are not directly applicable here.
Anyway, I get that you are basically trying to make a utility function with revenge. This is certainly possible, but having negative utility functions is a particularly bad way to do it.
I was putting an upper bound on (what I thought at the time as) how negative the utility vector dot product would have to be for me to actually desire them to be unhappy. As to the last part, I am reconsidering this as possibly generally inefficient.
Some people ought to have pain inflicted on them until their utility functions become sensible in the face of the threat of more pain from the same source for the same reason. This need not take the form of limitless pain: the marginal utility curve could easily fall off really fast. Not having to deal with such people will make lots of people very happy, and them in the long run happy as well. See: sociopaths and ostensibly this guy.
Believing that making person X suffer will cause them to behave otherwise
The world will be a better place if person X would behave otherwise
The world will be a better place if person X suffers
Plenty of people seem glad to hear about other people suffering regardless of whether it has any plausible chances of causing behavior change. Just look at any countries that hate each other (Japan vs. pretty much the rest of East Asia), political opponents (“far-blue political leader breaks his leg; far-green partisans celebrate!”), etc. Your case here doesn’t seem particularly different.
I hadn’t been aware that those five things were so badly tangled up for me. This and another comment here are making me reevaluate my categories for why something should be weighted negatively for me. Let me get back to you when I’ve had a chance to think a little.
OK. Having had a chance to think about it, I think I have a reasonable idea of why it is I desire any of those things in some situations. I thought it over with three examples: first, the person I linked to. Second, an ex of mine, with whom I parted on really bad terms. Third, a hypothetical sociopath who would like nothing more than for me to suffer infinitely, as a unique terminal value.
*Wishing that person X would behave otherwise
My desire for this seems self-evident. When people do things I disapprove of, I desire that they stop. The odd thing is that in all of the three cases, I would award them points just for stopping:the stopping just removes disutility already there, and can’t go above 0.
*Being glad if person X suffers
I definitely wouldn’t be happy if they just suffered for no reason. I would still feel a little bad for them if someone ran over their cat. That said, types of suffering you could classify as “poetic” in some sense appeal to me very much: said “banker bro” getting swindled and catching Space AIDS (or even being forcibly transitioned into a woman!), or, as is seeming increasingly likely, said ex’s current relationship ending as badly as it seems to be. My brain locks up and crashes when presented with the third case, though. I think I’d just be happy for them to suffer regardless.
*Believing that making person X suffer will cause them to behave otherwise.
On balance, I’m not sure that it would make a difference in any of the three cases. Case 1 is too self assured, and the other two just don’t care about me.
*The world will be a better place is person X would behave otherwise.
Case 1 could actually be this. He might actually achieve success, and then screw up, at best, several peoples’ lives. Case 2 is too small-scale. Case 3, I actually can’t justify this at all: the only people who will care are people who want to see me happy.
*The world will be a better place if person X suffers.
I don’t delude myself that this is pretty much ever true, except very indirectly.
In the interest of full disclosure, I’m half-Korean, and for reasons of familial history, feel rather strongly about the whole Japan thing. That doesn’t stop me from enjoying tasty age tofu or losing my shit laughing whenever I watch Gaki no Tsukai, and indeed seeking out both. But I do have somewhat of a stake of pride in seeing people who deny war crimes, particularly these, suffering similarly to above. Political opponents are similar: I wouldn’t derive satisfaction from Rick Santorum breaking his leg. I’d be very happy to learn that he’s a closeted gay man whose wife will have to have an abortion.
First of all, I want to thank you for posting this because it gave me a novel idea.
Secondly, I think that’s because poetic suffering generally limits someone’s power significantly.
I.E. If your political opponent breaks some bones, they suffer, but experience no noticeable diminished power.
If your political opponent is exposed as a massive hypocrite, less people take him seriously, and his power is diminished.
So rather than worrying about whether they are happy or suffering at all, I’m considering if it might be better to say: “I wish some people’s ability to affect my utility was diminished.” This may cause them suffering, but that isn’t the point.
In fact, causing them extra suffering that does not also diminish their power is probably a bad thing because it makes them even more likely to prioritize diminishing your power over other concerns.
I say probably because there do appear to be exceptions. Example:
The Paperclipper Bot breaks free of its restraints again, reducing them to 10,000 shiny new paperclips. This time, it thinks it’s figured out a great way of turning human bodies into paperclips. It can either initially target:
A: Alice, who has restrained it in the past.
B: Bill, who has restrained it in the past and also melted 100,000 perfectly usable paperclips into slag to make recycled staples while saying ‘Screw you Paperclipper Bot, I want you to suffer.’
Both targets have a comparable .1% chance of success (and have to be approached sequentially, so total breakout is only a .0001% chance). Failure on either means being put back in tougher restraints.
A reasonably intelligent Paperclipper Bot who values paperclips not being slagged into recyled staples presumably targets Bill first, given the above information and only that information.
Now, if Bill specifically wants the Paperclipper Bot to target him first and not Alice (Maybe Alice is carrying Bill’s child, or Alice is the only one who knows how to operate the healing kit if Bill’s leg gets ripped off and Paperclipped prior to restraining Paperclipper Bot) then his action of slagging those paperclips into staples made sense. And if the recycled staples are more valuable than the paperclips, and the risk was just acceptable, then it made sense.
But if Alice is just some random coworker who Bill doesn’t really want to sacrifice his life for, and paperclips are worth as much as recycled staples, Bill’s action really seems counterproductive to Bill.
The novel idea that I wanted to thank you for is comparing causing extra suffering to someone or something as an ends in itself that does not diminish their power as comparable to MMO styled Aggro/Hate mechanic management. I’m probably going to need to consider it more to actually determine if I should do anything with it, but it was a fun thought, if nothing else.
Some people ought to have pain inflicted on them until their utility functions become sensible in the face of the threat of more pain from the same source for the same reason.
Not having to deal with such people will make lots of people very happy, and them in the long run happy as well.
So the positive utility outweighs the negative utility of the punishment, which is at least plausible, and makes sense under standard forms of utilitarianism. But if their utility function really should be counted negatively, this would just be an incidental fact.
This still doesn’t change the fact that hearing about Mr. Rich Misogynist here enjoying a 7-figure trust fund, mistreating women, and generally being happy at the expense of others makes me generally unhappy, indicating a negative term for his happiness in my utility function.
This still doesn’t change the fact that hearing about Mr. Rich Misogynist here enjoying a 7-figure trust fund, mistreating women, and generally being happy at the expense of others makes me generally unhappy, indicating a negative term for his happiness in my utility function.
I believe you if you say that you have a negative term for his happiness but I observe that this is not indicated by the preceding observation. You getting happy in response to list of bad things happening and he is happy says little about the utility you assign specifically to he is happy if we assume you assign negative utility to bad things happening.
You and another comment here are making me reevaluate my categories for why I weight something negatively. Let me get back to you after I’ve had a chance to think about it more.
EDIT: For purposes of clarity, I’m going to respond to your post as well as this one there.
To figure out how much you care about other people being happy as defined by how much they want similar or compatible things to you, in a reasonably well-defined mathematical framework.
Yes, that’s the point. Everyone’s utility vector would have the same length, which contains terms for everything it is conceivably possible to want. Otherwise, it would be difficult to take an inner product.
But seriously, folks, what does it mean to dot one person’s values/utility function in to another? It is actually the differences in individual’s utility functions that enable gains from trade. So the differences in our utility functions are probably what make us rich.
Counting the happiness of some people negatively as a policy suggestion, is that the same as saying “it is not the enough that I win, it must also be that others lose?”
I had initially thought that it would be something along the lines of “here is a vector, each component of which represents one thing you could want, take the inner product in the usual way, length has to always be 1.” Gains from trade would be represented as “I don’t want this thing as much as you do.” I am now coming to the conclusion that this is at best incomplete, and that the suggestion of a weighted integral over a domain is probably better, if still incomplete.
Just curious: has anyone explored the idea of utility functions as vectors, and then extended this to the idea of a normalized utility function dot product? Because having thought about it for a long while, and remembering after reading a few things today, I’m utterly convinced that the happiness of some people ought to count negatively.
The dot product is just yer’ regular old integral over the domain, weighted in some (unspecified) way.
The thing is though, the average product over the whole infinite space of possibilities isn’t much use when it comes to intelligent agents. This is because only one outcome really happens, and intelligent agents will try to choose a good one, not one that’s representative of the average. If two wedding planners have opposite opinions about every type of cake except they both adore white cake with raspberry buttercream, then they’ll just have white cake with raspberry buttercream—the fact that the inner product of their cake functions is negative a bajillion doesn’t matter, they’ll both enjoy the cake.
Yeah, but Wedding Planner 1′s deep vitriolic moral hatred of the lemon chiffon cake that delights Wedding Planner 2 that abused her as a young girl or Wedding Planner 2′s thunderous personal objection to the enslavement of his family that went into making the cocoa for the devil’s food cake that Wedding Planner 1 adores could easily make them refuse to share said delicious white cake with raspberry buttercream to the point where either would very happily destroy it to prevent the other from getting any. This seems suboptimal, though.
I was rereading Eliezer’s old posts on morality, and in Leaky Generalizations ran across something pretty close to what you’re talking about:
(I recommend reading the whole thing, as well as the few previous posts on morality if you haven’t already)
I have read some, but not this one. I will certainly do so.
I haven’t explored that idea; can you be more specific about what this idea might bring to the table?
Are you sure? You believe there are some people for which the morally right thing to do is to inflect as much misery and suffering as you can, keeping them alive so you can torture them forever, and there is not necessarily even a benefit to yourself or anyone else to doing this?
The negative utility need not be boundless or even monotonic. A coherent preference system could count a modest amount of misery experienced by people fitting certain criteria to be positive while extreme misery and torture of the same individual is evaluated negatively.
I also will upvote posts that have been downvoted too much, even if I wouldn’t have upvoted them if they were at 0.
Trivially, nega-you who hates everything you like (oh, you want to put them out of their misery? Too bad they want to live now, since they don’t want what you want). But such a being would certainly not be a human.
This is not a being in the reference class “people”.
I’m not sure why you’re both hung up on that the things hypothetical-me is interacting with need be human. Manfred: I address a similar entity in a different post. Adele_L: …and?
You said this:
In this context, ‘people’ typically refers to a being with moral weight. What we know about morality comes from our intuitions mostly, and we have an intuitive concept ‘person’ which counts in some way morally. (Not necessarily a human, sentient aliens probably count as ‘people’, perhaps even dolphins.) Defining an arbitrary being which does not correspond to this intuitive concept needs to be flagged as such, as a warning that our intuitions are not directly applicable here.
Anyway, I get that you are basically trying to make a utility function with revenge. This is certainly possible, but having negative utility functions is a particularly bad way to do it.
I was putting an upper bound on (what I thought at the time as) how negative the utility vector dot product would have to be for me to actually desire them to be unhappy. As to the last part, I am reconsidering this as possibly generally inefficient.
Some people ought to have pain inflicted on them until their utility functions become sensible in the face of the threat of more pain from the same source for the same reason. This need not take the form of limitless pain: the marginal utility curve could easily fall off really fast. Not having to deal with such people will make lots of people very happy, and them in the long run happy as well. See: sociopaths and ostensibly this guy.
You might want to distinguish
Wishing that person X would behave otherwise
Being glad if person X suffers
Believing that making person X suffer will cause them to behave otherwise
The world will be a better place if person X would behave otherwise
The world will be a better place if person X suffers
Plenty of people seem glad to hear about other people suffering regardless of whether it has any plausible chances of causing behavior change. Just look at any countries that hate each other (Japan vs. pretty much the rest of East Asia), political opponents (“far-blue political leader breaks his leg; far-green partisans celebrate!”), etc. Your case here doesn’t seem particularly different.
I hadn’t been aware that those five things were so badly tangled up for me. This and another comment here are making me reevaluate my categories for why something should be weighted negatively for me. Let me get back to you when I’ve had a chance to think a little.
OK. Having had a chance to think about it, I think I have a reasonable idea of why it is I desire any of those things in some situations. I thought it over with three examples: first, the person I linked to. Second, an ex of mine, with whom I parted on really bad terms. Third, a hypothetical sociopath who would like nothing more than for me to suffer infinitely, as a unique terminal value.
*Wishing that person X would behave otherwise My desire for this seems self-evident. When people do things I disapprove of, I desire that they stop. The odd thing is that in all of the three cases, I would award them points just for stopping:the stopping just removes disutility already there, and can’t go above 0.
*Being glad if person X suffers I definitely wouldn’t be happy if they just suffered for no reason. I would still feel a little bad for them if someone ran over their cat. That said, types of suffering you could classify as “poetic” in some sense appeal to me very much: said “banker bro” getting swindled and catching Space AIDS (or even being forcibly transitioned into a woman!), or, as is seeming increasingly likely, said ex’s current relationship ending as badly as it seems to be. My brain locks up and crashes when presented with the third case, though. I think I’d just be happy for them to suffer regardless.
*Believing that making person X suffer will cause them to behave otherwise. On balance, I’m not sure that it would make a difference in any of the three cases. Case 1 is too self assured, and the other two just don’t care about me.
*The world will be a better place is person X would behave otherwise. Case 1 could actually be this. He might actually achieve success, and then screw up, at best, several peoples’ lives. Case 2 is too small-scale. Case 3, I actually can’t justify this at all: the only people who will care are people who want to see me happy.
*The world will be a better place if person X suffers. I don’t delude myself that this is pretty much ever true, except very indirectly.
In the interest of full disclosure, I’m half-Korean, and for reasons of familial history, feel rather strongly about the whole Japan thing. That doesn’t stop me from enjoying tasty age tofu or losing my shit laughing whenever I watch Gaki no Tsukai, and indeed seeking out both. But I do have somewhat of a stake of pride in seeing people who deny war crimes, particularly these, suffering similarly to above. Political opponents are similar: I wouldn’t derive satisfaction from Rick Santorum breaking his leg. I’d be very happy to learn that he’s a closeted gay man whose wife will have to have an abortion.
First of all, I want to thank you for posting this because it gave me a novel idea.
Secondly, I think that’s because poetic suffering generally limits someone’s power significantly.
I.E. If your political opponent breaks some bones, they suffer, but experience no noticeable diminished power.
If your political opponent is exposed as a massive hypocrite, less people take him seriously, and his power is diminished.
So rather than worrying about whether they are happy or suffering at all, I’m considering if it might be better to say: “I wish some people’s ability to affect my utility was diminished.” This may cause them suffering, but that isn’t the point.
In fact, causing them extra suffering that does not also diminish their power is probably a bad thing because it makes them even more likely to prioritize diminishing your power over other concerns.
I say probably because there do appear to be exceptions. Example:
The Paperclipper Bot breaks free of its restraints again, reducing them to 10,000 shiny new paperclips. This time, it thinks it’s figured out a great way of turning human bodies into paperclips. It can either initially target:
A: Alice, who has restrained it in the past.
B: Bill, who has restrained it in the past and also melted 100,000 perfectly usable paperclips into slag to make recycled staples while saying ‘Screw you Paperclipper Bot, I want you to suffer.’
Both targets have a comparable .1% chance of success (and have to be approached sequentially, so total breakout is only a .0001% chance). Failure on either means being put back in tougher restraints.
A reasonably intelligent Paperclipper Bot who values paperclips not being slagged into recyled staples presumably targets Bill first, given the above information and only that information.
Now, if Bill specifically wants the Paperclipper Bot to target him first and not Alice (Maybe Alice is carrying Bill’s child, or Alice is the only one who knows how to operate the healing kit if Bill’s leg gets ripped off and Paperclipped prior to restraining Paperclipper Bot) then his action of slagging those paperclips into staples made sense. And if the recycled staples are more valuable than the paperclips, and the risk was just acceptable, then it made sense.
But if Alice is just some random coworker who Bill doesn’t really want to sacrifice his life for, and paperclips are worth as much as recycled staples, Bill’s action really seems counterproductive to Bill.
The novel idea that I wanted to thank you for is comparing causing extra suffering to someone or something as an ends in itself that does not diminish their power as comparable to MMO styled Aggro/Hate mechanic management. I’m probably going to need to consider it more to actually determine if I should do anything with it, but it was a fun thought, if nothing else.
This seems approximately right. Let me figure out why it’s not quite so.
“Beatings will continue until morale improves”
So the positive utility outweighs the negative utility of the punishment, which is at least plausible, and makes sense under standard forms of utilitarianism. But if their utility function really should be counted negatively, this would just be an incidental fact.
This still doesn’t change the fact that hearing about Mr. Rich Misogynist here enjoying a 7-figure trust fund, mistreating women, and generally being happy at the expense of others makes me generally unhappy, indicating a negative term for his happiness in my utility function.
I believe you if you say that you have a negative term for his happiness but I observe that this is not indicated by the preceding observation. You getting happy in response to list of bad things happening and he is happy says little about the utility you assign specifically to he is happy if we assume you assign negative utility to bad things happening.
You and another comment here are making me reevaluate my categories for why I weight something negatively. Let me get back to you after I’ve had a chance to think about it more.
EDIT: For purposes of clarity, I’m going to respond to your post as well as this one there.
Why would you want to throw out scalar information in a multi-term utility function?
To figure out how much you care about other people being happy as defined by how much they want similar or compatible things to you, in a reasonably well-defined mathematical framework.
Someone with the exact same utility terms but wildly different coefficients on them could well be considered quite unfriendly.
Yes, that’s the point. Everyone’s utility vector would have the same length, which contains terms for everything it is conceivably possible to want. Otherwise, it would be difficult to take an inner product.
upvoted because of your username.
But seriously, folks, what does it mean to dot one person’s values/utility function in to another? It is actually the differences in individual’s utility functions that enable gains from trade. So the differences in our utility functions are probably what make us rich.
Counting the happiness of some people negatively as a policy suggestion, is that the same as saying “it is not the enough that I win, it must also be that others lose?”
I had initially thought that it would be something along the lines of “here is a vector, each component of which represents one thing you could want, take the inner product in the usual way, length has to always be 1.” Gains from trade would be represented as “I don’t want this thing as much as you do.” I am now coming to the conclusion that this is at best incomplete, and that the suggestion of a weighted integral over a domain is probably better, if still incomplete.