Thanks, I think I understand better. We have some progress here:
We agree that the naive model of a selfish person who doesn’t have any interest in helping others hardly ever describes real people .
We seem to agree that guilt-aversion as a desire doesn’t make sense, but maybe for different reasons.
I think it doesn’t make sense because when I say someone desires X, I mean that they prefer worlds with property X over worlds lacking that property, and I’m only interested in X’s that describe the part of the world outside of their own thought process. For the purposes of figuring out what someone desires, I don’t care if they want it because of guilt aversion or because they’re hungry or some other motive; all I care is that I expect them to make some effort to make it happen, given the opportunity, and taking into account their (perhaps false) model of how the world works.
Maybe I do agree with you enough on this that the difference is unimportant. You said:
If the aim is actually guilt-aversion, this collapses back to position 1), because the
person must admit to themselves that other people’s desires are only a correlate of
what they want (which is to not feel guilty).
I think you’re assuming here that people who claim a desire to help people and are really motivated by guilt-aversion are ineffective. I’m not sure that’s always true. Certainly, if they’re ineffective at helping people due to their own internal process, in practice they don’t really want to help people.
Has a desire to help others, and pursues it in good faith, using some definition of
which universes are preferable that does not weight their own desires over the
desires of others.
I don’t know what it means to “weight their own desires over the desires of others”. If I’m willing to donate a kidney but not donate my only liver, and the potential liver recipient desires to have a better liver, have I weighted my own desires over the desires of others? Maybe you meant “weight their own desires to the exclusion of the desires of others”.
We might disagree about what it means to help others. Personally, I don’t care much about what people want. For example, I have a friend who is alcoholic. He desires alcohol. I care about him and have provided him with room and board in the past when he needed it, but I don’t want him to get alcohol. So my compassion for him is about me wanting to move the world (including him) to the place I want it to go, not some compromise between my desires and his desires.
So I want what I want, and my actions are based on what I want. Some of the things I want give other people some of the things they want. Should it be some other way?
Now a Friendly AI is different. When we’re setting up its utility function, it has no built-in desires of its own, so the only reasonable thing for it to desire is some average of the desires of whoever it’s being Friendly toward. But you and I are human, so we’re not like that—we come into this with our own desires. Let’s not confuse the two and try to act like a machine.
Yes, I think we’re converging onto the interesting disagreements.
I think you’re assuming here that people who claim a desire to help people and are really motivated by guilt-aversion are ineffective. I’m not sure that’s always true. Certainly, if they’re ineffective at helping people due to their own internal process, in practice they don’t really want to help people.
This is largely an empirical point, but I think we differ on it substantially.
I think if people don’t think analytically, and even a little ruthlessly, they’re very ineffective at helping people. The list of failure modes is long. People prefer to help people they can see at the expense of those out of sight who could be helped more cheaply. They’re irrationally intolerant of uncertainty of outcome. They’re not properly sensitive to scale. I haven’t cited these points, but hopefully you agree. If not we can dig a little deeper into them.
I don’t know what it means to “weight their own desires over the desires of others”. If I’m willing to donate a kidney but not donate my only liver, and the potential liver recipient desires to have a better liver, have I weighted my own desires over the desires of others? Maybe you meant “weight their own desires to the exclusion of the desires of others”.
I just meant that self-utility doesn’t get a huge multiplier when compared against others-utility. In the transplant donation example, you get just as much out of your liver as whoever you might give it to. So you’d be going down N utilons and they’d be going up N utilons, and there would be a substantial transaction cost of M utilons. So liver donation wouldn’t be a useful thing to do.
In another example, imagine your organs could save, say, 10 lives. I wouldn’t do that. There are two angles here.
The first is about strategy. You don’t improve the world by being a sucker who can be taken advantage of. You do have to fight your corner, too, otherwise you just promote free-riding. If all the do-gooders get organ harvested, the world is probably not better off.
But even if extremes of altruism were not anti-strategic, I can’t say I’d do them either. There are lots of actions which I would have to admit result in extreme loss of self-utility and extreme gain in net utility that I don’t carry out. These actions are still moral, it’s just that they’re more than I’m willing to do. Some people are excessively uncomfortable about this, and so give up on the idea of trying to be more moral altogether. This is to make the perfect the enemy of the good. Others are uncomfortable about it and try to twist their definition of morality into knots to conform to what they’re willing to do.
The moral ideal is to have a self-utility weight of 1.0: ie, you’re completely impartial to whether the utility is going to you as opposed to someone else. I don’t achieve this, and I don’t expect many other people do either.
But being able to set this selfishness constant isn’t a get-out-of-jail-free card. I have to think about the equation, and how selfish the action would imply I really am. For instance, as an empirical point, I believe that eating meat given the current practices of animal husbandry demands a very high selfishness constant. I can’t reconcile being that selfish with my self-image, and my self-image is more important to me than eating meat. So, vegetarianism, with an attempt to minimise dairy consumption, but not strict veganism, even though veganism is more moral.
We might disagree about what it means to help others. Personally, I don’t care much about what people want. For example, I have a friend who is alcoholic. He desires alcohol. I care about him and have provided him with room and board in the past when he needed it, but I don’t want him to get alcohol. So my compassion for him is about me wanting to move the world (including him) to the place I want it to go, not some compromise between my desires and his desires.
Yes, there are problems with preference utilitarianism. I think some people try to get around the alcoholic example by saying something like “if their desires weren’t being modified by their alcoholism they would want x, and would want you to act as though they wanted x, so those are the true preferences.” As I write this it seems that has to be some kind of strawman, as the idea of some Platonic “true preferences” is quite visibly flawed. There’s no way to distinguish the class of preference-modifiers that includes things like alcoholism from the other preference-modifiers that together constitute a person.
I use preferences because it works well enough most of the time, and I don’t have a good alternate formulation. I don’t actually think the specifics of the metric being maximised are usually that important. I think it would be better to agree on desiderata for the measure—properties that it ought to exhibit.
Anyway. What I’m trying to say is a little clearer to me now. I don’t think the key idea is really about meta-ethics at all. The idea is just that almost everyone follows a biased, heuristic-based strategy for satisfying their moral desires, and that this strategy isn’t actually very productive. It satisfies the heuristics like “I am feeling guilty, which means I need to help someone now”, but it doesn’t scratch the deeper itch to believe you genuinely make a difference very well.
So the idea is just that morality is another area where many people would benefit from deploying rationality. But this one’s counter-intuitive, because it takes a rather cold and ruthless mindset to carry it through.
Okay, I agree that what you want to do works most of the time, and we seem to agree that you don’t have good solution to the alcoholism problem, and we also seem to agree that acting from a mishmash of heuristics without any reflection or attempts to make a rational whole will very likely flounder around uselessly.
Not to imply that our conversation was muddled by the following, but: we can reformulate the alcoholism problem to eliminate the addiction. Suppose my friend heard about that reality show guy who was killed by a stingray and wanted to spend his free time killing stingrays to get revenge. (I heard there are such people, but I have never met one.) I wouldn’t want to help him with that, either.
There’s a strip of an incredibly over-the-top vulgar comic called space moose that gets at the same idea. These acts of kindness aren’t positive utility, even if the utility metric is based on desires, because they conflict with the desires of the stingrays or other victims. Preferences also need to be weighted somehow in preference utilitarianism, I suppose by importance to the person. But then hmm, anyone gets to be a utility monster by just really really really really wanting to kill the stringrays. So yeah there’s a problem there.
I think I need to update, and abandon preference utilitarianism even as a useful correlate of whatever the right measure would be.
While it’s gratifying to win an argument, I’d rather not do it under false pretenses:
But then hmm, anyone gets to be a utility monster by just really really really really
wanting to kill the stringrays.
We need a solution to the utility monster problem if we’re going to have a Friendly AI that cares about people’s desires, so it’s better to solve the utility monster problem than to give up on preference utilitarianism in part because you don’t know how to solve the utility monster problem. I’ve sketched proposed solutions to two types of utility monsters, one that has one entity with large utility and one that has a large number of entities with modest utility. If these putative solutions seem wrong to you, please post bugs, fixes, or alternatives as replies to those comments.
I agree that preference utilitarianism has the problem that it doesn’t free you from choosing how to weight the preferences. It also has the problem that you have to separate yourself into two parts, the part that gets to have its preference included in the weighted sum, and the part that has a preference that is the weighted sum. In reality there’s only one of you, so that distinction is artificial.
Thanks, I think I understand better. We have some progress here:
We agree that the naive model of a selfish person who doesn’t have any interest in helping others hardly ever describes real people .
We seem to agree that guilt-aversion as a desire doesn’t make sense, but maybe for different reasons.
I think it doesn’t make sense because when I say someone desires X, I mean that they prefer worlds with property X over worlds lacking that property, and I’m only interested in X’s that describe the part of the world outside of their own thought process. For the purposes of figuring out what someone desires, I don’t care if they want it because of guilt aversion or because they’re hungry or some other motive; all I care is that I expect them to make some effort to make it happen, given the opportunity, and taking into account their (perhaps false) model of how the world works.
Maybe I do agree with you enough on this that the difference is unimportant. You said:
I think you’re assuming here that people who claim a desire to help people and are really motivated by guilt-aversion are ineffective. I’m not sure that’s always true. Certainly, if they’re ineffective at helping people due to their own internal process, in practice they don’t really want to help people.
I don’t know what it means to “weight their own desires over the desires of others”. If I’m willing to donate a kidney but not donate my only liver, and the potential liver recipient desires to have a better liver, have I weighted my own desires over the desires of others? Maybe you meant “weight their own desires to the exclusion of the desires of others”.
We might disagree about what it means to help others. Personally, I don’t care much about what people want. For example, I have a friend who is alcoholic. He desires alcohol. I care about him and have provided him with room and board in the past when he needed it, but I don’t want him to get alcohol. So my compassion for him is about me wanting to move the world (including him) to the place I want it to go, not some compromise between my desires and his desires.
So I want what I want, and my actions are based on what I want. Some of the things I want give other people some of the things they want. Should it be some other way?
Now a Friendly AI is different. When we’re setting up its utility function, it has no built-in desires of its own, so the only reasonable thing for it to desire is some average of the desires of whoever it’s being Friendly toward. But you and I are human, so we’re not like that—we come into this with our own desires. Let’s not confuse the two and try to act like a machine.
Yes, I think we’re converging onto the interesting disagreements.
This is largely an empirical point, but I think we differ on it substantially.
I think if people don’t think analytically, and even a little ruthlessly, they’re very ineffective at helping people. The list of failure modes is long. People prefer to help people they can see at the expense of those out of sight who could be helped more cheaply. They’re irrationally intolerant of uncertainty of outcome. They’re not properly sensitive to scale. I haven’t cited these points, but hopefully you agree. If not we can dig a little deeper into them.
I just meant that self-utility doesn’t get a huge multiplier when compared against others-utility. In the transplant donation example, you get just as much out of your liver as whoever you might give it to. So you’d be going down N utilons and they’d be going up N utilons, and there would be a substantial transaction cost of M utilons. So liver donation wouldn’t be a useful thing to do.
In another example, imagine your organs could save, say, 10 lives. I wouldn’t do that. There are two angles here.
The first is about strategy. You don’t improve the world by being a sucker who can be taken advantage of. You do have to fight your corner, too, otherwise you just promote free-riding. If all the do-gooders get organ harvested, the world is probably not better off.
But even if extremes of altruism were not anti-strategic, I can’t say I’d do them either. There are lots of actions which I would have to admit result in extreme loss of self-utility and extreme gain in net utility that I don’t carry out. These actions are still moral, it’s just that they’re more than I’m willing to do. Some people are excessively uncomfortable about this, and so give up on the idea of trying to be more moral altogether. This is to make the perfect the enemy of the good. Others are uncomfortable about it and try to twist their definition of morality into knots to conform to what they’re willing to do.
The moral ideal is to have a self-utility weight of 1.0: ie, you’re completely impartial to whether the utility is going to you as opposed to someone else. I don’t achieve this, and I don’t expect many other people do either.
But being able to set this selfishness constant isn’t a get-out-of-jail-free card. I have to think about the equation, and how selfish the action would imply I really am. For instance, as an empirical point, I believe that eating meat given the current practices of animal husbandry demands a very high selfishness constant. I can’t reconcile being that selfish with my self-image, and my self-image is more important to me than eating meat. So, vegetarianism, with an attempt to minimise dairy consumption, but not strict veganism, even though veganism is more moral.
Yes, there are problems with preference utilitarianism. I think some people try to get around the alcoholic example by saying something like “if their desires weren’t being modified by their alcoholism they would want x, and would want you to act as though they wanted x, so those are the true preferences.” As I write this it seems that has to be some kind of strawman, as the idea of some Platonic “true preferences” is quite visibly flawed. There’s no way to distinguish the class of preference-modifiers that includes things like alcoholism from the other preference-modifiers that together constitute a person.
I use preferences because it works well enough most of the time, and I don’t have a good alternate formulation. I don’t actually think the specifics of the metric being maximised are usually that important. I think it would be better to agree on desiderata for the measure—properties that it ought to exhibit.
Anyway. What I’m trying to say is a little clearer to me now. I don’t think the key idea is really about meta-ethics at all. The idea is just that almost everyone follows a biased, heuristic-based strategy for satisfying their moral desires, and that this strategy isn’t actually very productive. It satisfies the heuristics like “I am feeling guilty, which means I need to help someone now”, but it doesn’t scratch the deeper itch to believe you genuinely make a difference very well.
So the idea is just that morality is another area where many people would benefit from deploying rationality. But this one’s counter-intuitive, because it takes a rather cold and ruthless mindset to carry it through.
Okay, I agree that what you want to do works most of the time, and we seem to agree that you don’t have good solution to the alcoholism problem, and we also seem to agree that acting from a mishmash of heuristics without any reflection or attempts to make a rational whole will very likely flounder around uselessly.
Not to imply that our conversation was muddled by the following, but: we can reformulate the alcoholism problem to eliminate the addiction. Suppose my friend heard about that reality show guy who was killed by a stingray and wanted to spend his free time killing stingrays to get revenge. (I heard there are such people, but I have never met one.) I wouldn’t want to help him with that, either.
There’s a strip of an incredibly over-the-top vulgar comic called space moose that gets at the same idea. These acts of kindness aren’t positive utility, even if the utility metric is based on desires, because they conflict with the desires of the stingrays or other victims. Preferences also need to be weighted somehow in preference utilitarianism, I suppose by importance to the person. But then hmm, anyone gets to be a utility monster by just really really really really wanting to kill the stringrays. So yeah there’s a problem there.
I think I need to update, and abandon preference utilitarianism even as a useful correlate of whatever the right measure would be.
While it’s gratifying to win an argument, I’d rather not do it under false pretenses:
We need a solution to the utility monster problem if we’re going to have a Friendly AI that cares about people’s desires, so it’s better to solve the utility monster problem than to give up on preference utilitarianism in part because you don’t know how to solve the utility monster problem. I’ve sketched proposed solutions to two types of utility monsters, one that has one entity with large utility and one that has a large number of entities with modest utility. If these putative solutions seem wrong to you, please post bugs, fixes, or alternatives as replies to those comments.
I agree that preference utilitarianism has the problem that it doesn’t free you from choosing how to weight the preferences. It also has the problem that you have to separate yourself into two parts, the part that gets to have its preference included in the weighted sum, and the part that has a preference that is the weighted sum. In reality there’s only one of you, so that distinction is artificial.