I don’t think this argument works. The normal world is mediocristan, so the “humans have non-overlapping utility functions” musing is off-topic, right?
In normal world mediocristan, people have importantly overlapping concerns—e.g. pretty much nobody is in favor of removing all the oxygen from the atmosphere.
But it’s more than that: people actually intrinsically care about each other, and each other’s preferences, for their own sake, by and large. (People tend to be somewhat corrigible to each other, Yudkowsky might say?) There are in fact a few (sociopathic) people who have a purely transactional way of relating to other humans—they’ll cooperate when it selfishly benefits them to cooperate, they’ll tell the truth when it selfishly benefits them to tell the truth, and then they’ll lie and stab you in the back as soon as the situation changes. And those people are really really bad. If the population of Earth grows exclusively via the addition of those kinds of people, that’s really bad, and I very strongly do not want that. Having ≈1% of the population with that personality is damaging enough already; if 90% of the human population were like that (as in your 10×’ing scenario) I shudder to imagine the consequences.
the “humans have non-overlapping utility functions” musing is off-topic, right?
I don’t think it’s off-topic, since the central premise of Joe Carlsmith’s post is that humans might have non-overlapping utility functions, even upon reflection. I think my comment is simply taking his post seriously, and replying to it head-on.
Separately, I agree there’s a big question about whether humans have “importantly overlapping concerns” in a sense that is important and relevantly different from AI. Without wading too much into this debate, I’ll just say: I agree human nature occasionally has some kinder elements, but mostly I think the world runs on selfishness. As Adam Smith wrote, “It is not from the benevolence of the butcher, the brewer, or the baker, that we expect our dinner, but from their regard to their own interest.” And of course, AIs might have some kinder elements in their nature too.
the “humans have non-overlapping utility functions” musing is off-topic, right?
I don’t think it’s off-topic, since the central premise of Joe Carlsmith’s post is that humans might have non-overlapping utility functions, even upon reflection. I think my comment is simply taking his post seriously, and replying to it head-on.
I mean, I’m quite sure that it’s false, as an empirical claim about the normal human world, that the normal things Alice chooses to do, will tend to make a random different person Bob worse-off, on-average, as judged by Bob himself, including upon reflection. I really don’t think Joe was trying to assert to the contrary in the OP.
Instead, I think Joe was musing that if Alice FOOMed to dictator of the universe, and tiled the galaxies with [whatever], then maybe Bob would be extremely unhappy about that, comparably unhappy to if Alice was tiling the galaxies with paperclips. And vice-versa if Bob FOOMed to dictator of the universe. And that premise seems at least possible, as far as I know.
This seemed to be a major theme of the OP—see the discussions of “extremal Goodhart”, and “the tails come apart”—so I’m confused that you don’t seem to see that as very central.
I agree human nature occasionally has some kinder elements, but mostly I think the world runs on selfishness. As Adam Smith wrote, “It is not from the benevolence of the butcher, the brewer, or the baker, that we expect our dinner, but from their regard to their own interest.”
I’m not sure how much we’re disagreeing here. I agree that the butcher and brewer are mainly working because they want to earn money. And I hope you will also agree that if the butcher and brewer and everyone else were selfish to the point of being sociopathic, it would be a catastrophe. Our society relies on the fact that there are just not many people, as a proportion of the population, who will flagrantly and without hesitation steal and lie and commit fraud and murder as long as they’re sufficiently confident that they can get away with it without getting a reputation hit or other selfishly-bad consequences. The economy (and world) relies on some minimal level of trust between employees, coworkers, business partners and so on, trust that they will generally follow norms and act with a modicum of integrity, even when nobody is looking. The reason that scams and frauds can get off the ground at all is that there is in fact a prevailing ecosystem of trust that they can exploit. Right?
This seemed to be a major theme of the OP—see the discussions of “extremal Goodhart”, and “the tails come apart”—so I’m confused that you don’t seem to see that as very central.
I agree that a large part of Joe’s post was about the idea that human values diverge in the limit. But I think if you take the thing he wrote about human selfishness seriously, then it is perfectly reasonable to talk about the ordinary cases of value divergence too, which I think are very common. Joe wrote,
And we can worry about the human-human case for more mundane reasons, too. Thus, for example, it’s often thought that a substantial part of what’s going on with human values is either selfish or quite “partial.” That is, many humans want pleasure, status, flourishing, etc for themselves, and then also for their family, local community, and so on. We can posit that this aspect of human values will disappear or constrain itself on reflection, or that it will “saturate” to the point where more impartial and cosmopolitan values start to dominate in practice – but see above re: “convenient and substantive empirical hypothesis” (and if “saturation” helps with extremal-Goodhart problems, can you make the AI’s values saturate, too?).
I mean, I’m quite sure that it’s false, as an empirical claim about the normal human world, that the normal things Alice chooses to do, will tend to make a random different person Bob worse-off, on-average, as judged by Bob himself, including upon reflection.
I think you’re potentially mixing up two separate claims that have different implications. I’m not saying that people rarely act in such a way that makes random strangers better off. In addition to my belief in a small yet clearly real altruistic element to human nature, there’s the obvious fact that the world is not zero-sum, and people routinely engage in mutually beneficial actions that make both parties in the interaction better off.
I am claiming that people are mostly selfish, and that the majority of economic and political behavior seems to be the result of people acting in their own selfish interests, rather than mostly out of the kindness of their heart. (Although by selfishness I’m including concern for one’s family and friends; I’m simply excluding concern for total strangers.)
That is exactly Adam Smith’s point: it is literally in the self-interest of the baker for them to sell us dinner. Even if the baker were completely selfish, they’d still sell us food. Put yourself in their shoes. If you were entirely selfish and had no regard for the preferences of other people, wouldn’t you still try to obey the law and engage in trade with other people?
You said that, “if the butcher and brewer and everyone else were selfish to the point of being sociopathic, it would be a catastrophe.” But I live in the world where the average person donates very little to charity. I’m already living in a world like the one you are describing; it’s simply less extreme along this axis. In such a world, maybe charitable donations go from being 3% of GDP to being 0% of GDP, but presumably we’d still have to obey laws and trade with each other to get our dinner, because those things are mainly social mechanisms we use to coordinate our mostly selfish values.
The economy (and world) relies on some minimal level of trust between employees, coworkers, business partners and so on, trust that they will generally follow norms and act with a modicum of integrity, even when nobody is looking.
Trust is also valuable selfishly, if you can earn it from others, or if you care about not being deceived by others yourself. Again, put yourself in the shoes of a sociopath: is it selfishly profitable, in expectation, to go around and commit a lot of fraud? Maybe if you can get away with it with certainty. But most of the time, people can’t be certain they’ll get away with it, and the consequences of getting caught are quite severe.
Is it selfishly profitable to reject the norm punishing fraudsters? Maybe if you can ensure this rejection won’t come back to hurt you. But I don’t think that’s something you can always ensure. The world seems much richer and better off, even from your perspective, if we have laws against theft, fraud, and murder.
It is true in a literal sense that selfish people have no incentive to tell the truth about things that nobody will ever find out about. But the world is a repeated game. Many important truths about our social world are things that will at some point be exposed. If you want to have power, there’s a lot of stuff you should not lie about, because it will hurt you, even selfishly, in the eyes of other (mostly selfish) people.
Selfishness doesn’t mean stupidity. Going around flagrantly violating norms, stealing from people, and violating everyone’s trust doesn’t actually increase your own selfish utility. It hurts you, because people will be less likely to want to deal with and trade with you in the future. It also usually helps you to uphold these norms, even as a selfish person, because like everyone else, you don’t want to be the victim of fraud either.
I don’t think this argument works. The normal world is mediocristan, so the “humans have non-overlapping utility functions” musing is off-topic, right?
That would be true if extremes are the only possible source of value divergence, but they are not. You can see that ordinary people in ordinary situations have diverging values from politics.
I don’t think this argument works. The normal world is mediocristan, so the “humans have non-overlapping utility functions” musing is off-topic, right?
In normal world mediocristan, people have importantly overlapping concerns—e.g. pretty much nobody is in favor of removing all the oxygen from the atmosphere.
But it’s more than that: people actually intrinsically care about each other, and each other’s preferences, for their own sake, by and large. (People tend to be somewhat corrigible to each other, Yudkowsky might say?) There are in fact a few (sociopathic) people who have a purely transactional way of relating to other humans—they’ll cooperate when it selfishly benefits them to cooperate, they’ll tell the truth when it selfishly benefits them to tell the truth, and then they’ll lie and stab you in the back as soon as the situation changes. And those people are really really bad. If the population of Earth grows exclusively via the addition of those kinds of people, that’s really bad, and I very strongly do not want that. Having ≈1% of the population with that personality is damaging enough already; if 90% of the human population were like that (as in your 10×’ing scenario) I shudder to imagine the consequences.
Sorry if I’m misunderstanding :)
I don’t think it’s off-topic, since the central premise of Joe Carlsmith’s post is that humans might have non-overlapping utility functions, even upon reflection. I think my comment is simply taking his post seriously, and replying to it head-on.
Separately, I agree there’s a big question about whether humans have “importantly overlapping concerns” in a sense that is important and relevantly different from AI. Without wading too much into this debate, I’ll just say: I agree human nature occasionally has some kinder elements, but mostly I think the world runs on selfishness. As Adam Smith wrote, “It is not from the benevolence of the butcher, the brewer, or the baker, that we expect our dinner, but from their regard to their own interest.” And of course, AIs might have some kinder elements in their nature too.
If you’re interested in a slightly longer statement of my beliefs about this topic, I recently wrote a post that addressed some of these points.
I mean, I’m quite sure that it’s false, as an empirical claim about the normal human world, that the normal things Alice chooses to do, will tend to make a random different person Bob worse-off, on-average, as judged by Bob himself, including upon reflection. I really don’t think Joe was trying to assert to the contrary in the OP.
Instead, I think Joe was musing that if Alice FOOMed to dictator of the universe, and tiled the galaxies with [whatever], then maybe Bob would be extremely unhappy about that, comparably unhappy to if Alice was tiling the galaxies with paperclips. And vice-versa if Bob FOOMed to dictator of the universe. And that premise seems at least possible, as far as I know.
This seemed to be a major theme of the OP—see the discussions of “extremal Goodhart”, and “the tails come apart”—so I’m confused that you don’t seem to see that as very central.
I’m not sure how much we’re disagreeing here. I agree that the butcher and brewer are mainly working because they want to earn money. And I hope you will also agree that if the butcher and brewer and everyone else were selfish to the point of being sociopathic, it would be a catastrophe. Our society relies on the fact that there are just not many people, as a proportion of the population, who will flagrantly and without hesitation steal and lie and commit fraud and murder as long as they’re sufficiently confident that they can get away with it without getting a reputation hit or other selfishly-bad consequences. The economy (and world) relies on some minimal level of trust between employees, coworkers, business partners and so on, trust that they will generally follow norms and act with a modicum of integrity, even when nobody is looking. The reason that scams and frauds can get off the ground at all is that there is in fact a prevailing ecosystem of trust that they can exploit. Right?
I agree that a large part of Joe’s post was about the idea that human values diverge in the limit. But I think if you take the thing he wrote about human selfishness seriously, then it is perfectly reasonable to talk about the ordinary cases of value divergence too, which I think are very common. Joe wrote,
I think you’re potentially mixing up two separate claims that have different implications. I’m not saying that people rarely act in such a way that makes random strangers better off. In addition to my belief in a small yet clearly real altruistic element to human nature, there’s the obvious fact that the world is not zero-sum, and people routinely engage in mutually beneficial actions that make both parties in the interaction better off.
I am claiming that people are mostly selfish, and that the majority of economic and political behavior seems to be the result of people acting in their own selfish interests, rather than mostly out of the kindness of their heart. (Although by selfishness I’m including concern for one’s family and friends; I’m simply excluding concern for total strangers.)
That is exactly Adam Smith’s point: it is literally in the self-interest of the baker for them to sell us dinner. Even if the baker were completely selfish, they’d still sell us food. Put yourself in their shoes. If you were entirely selfish and had no regard for the preferences of other people, wouldn’t you still try to obey the law and engage in trade with other people?
You said that, “if the butcher and brewer and everyone else were selfish to the point of being sociopathic, it would be a catastrophe.” But I live in the world where the average person donates very little to charity. I’m already living in a world like the one you are describing; it’s simply less extreme along this axis. In such a world, maybe charitable donations go from being 3% of GDP to being 0% of GDP, but presumably we’d still have to obey laws and trade with each other to get our dinner, because those things are mainly social mechanisms we use to coordinate our mostly selfish values.
Trust is also valuable selfishly, if you can earn it from others, or if you care about not being deceived by others yourself. Again, put yourself in the shoes of a sociopath: is it selfishly profitable, in expectation, to go around and commit a lot of fraud? Maybe if you can get away with it with certainty. But most of the time, people can’t be certain they’ll get away with it, and the consequences of getting caught are quite severe.
Is it selfishly profitable to reject the norm punishing fraudsters? Maybe if you can ensure this rejection won’t come back to hurt you. But I don’t think that’s something you can always ensure. The world seems much richer and better off, even from your perspective, if we have laws against theft, fraud, and murder.
It is true in a literal sense that selfish people have no incentive to tell the truth about things that nobody will ever find out about. But the world is a repeated game. Many important truths about our social world are things that will at some point be exposed. If you want to have power, there’s a lot of stuff you should not lie about, because it will hurt you, even selfishly, in the eyes of other (mostly selfish) people.
Selfishness doesn’t mean stupidity. Going around flagrantly violating norms, stealing from people, and violating everyone’s trust doesn’t actually increase your own selfish utility. It hurts you, because people will be less likely to want to deal with and trade with you in the future. It also usually helps you to uphold these norms, even as a selfish person, because like everyone else, you don’t want to be the victim of fraud either.
That would be true if extremes are the only possible source of value divergence, but they are not. You can see that ordinary people in ordinary situations have diverging values from politics.