Non-positional, mutually-satisfiable values (physical luxury, for instance)
Positional, zero-sum social values, such as wanting to be the alpha male or the homecoming queen
All mutually-satisfiable values have more in common with each other than they do with any non-mutually-satisfiable values, because mutually-satisfiable values are compatible with social harmony and non-problematic utility maximization, while non- mutually-satisfiable values require eternal conflict.
David Friedman pointed out that this isn’t correct, it’s actually it’s quite easy to make positional values mutually satisfiable:
It seems obvious that, if one’s concern is status rather than real income, we are in a zero sum game..…
Like many things that seem obvious, this one is false. It is true that my status is relative to yours. It does not, oddly enough, follow that if my status is higher than yours, yours must be lower than mine, or that if my status increases someone else’s must decrease. Status is not, in fact, a zero sum game.
This point was originally made clear to me when I was an undergraduate at Harvard and realized that Harvard had, in at least one interesting way, the perfect social system: Everyone at the top of his own ladder. The small minority of students passionately interested in drama knew perfectly well that they were the most important people at the university; everyone else was there to provide them with an audience....
Being a male nurse is not a terribly high status job—but that may not much matter if you are also King of the Middle Kingdom. And the status you get by being king does not reduce the status of the doctors who know that they are at the top of the medical ladder and the nurses at the bottom.
[Emphasis mine]
A FAI could simply make sure that everyone is a member of enough social groups that everyone has high status in some of them. Positional goals can be mutually satisficed, if one is smart enough about it. Those two types of value don’t differ as much as you seem to think they do. Positional goals just require a little more work to make implementing them conflict-free than the other type does.
If you extract a set of consciously-believed propositions from an existing agent, then build a new agent to use those propositions in a different environment, with an “improved” logic, you can’t claim that it has the same values, since it will behave differently.
I don’t think I agree with this. Couldn’t you take that argument further and claim that if I undergo some sort of rigorous self-improvement program in order to better achieve my goals in life, that that must mean I now have different values? In fact, you could easily say that I am behaving pointlessly because I’m not achieving my values better, I’m just changing them? It seems likely that most of the things that you are describing as values aren’t really values, they’re behaviors. I’d regard values as more “the direction in which you want to steer the world,” both in terms of your external environment and your emotional states. Behaviors are things you do, but they aren’t necessarily what you really prefer.
I agree that a more precise and articulate definition of these terms might be needed to create a FAI, especially if human preferences are part of a network of some sort as you claim, but I do think that they cleave reality at the joints.
I can’t really see how you can attack CEV by this route without also attacking any attempt at self-improvement by a person.
A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry. Many human values horrify most people on this list, so they shouldn’t be trying to preserve them.
The fact that these values seem to change or weaken as people become wealthier and better educated indicates that they probably are poorly extrapolated values. Most of these people don’t really want to do these things, they just think they do because they lack the cognitive ability to see it. This is emphasized by the fact that these people, when called out on their behavior, often make up some consequentialist justification for it (if I don’t do it God will send an earthquake!)
I’ll use an example from my own personal experience to illustrate this, when I was little (around 2-5) I thought horror movies were evil because they scared me. I didn’t want to watch horror movies or even be in the same room with a horror movie poster. I thought people should be punished for making such scary things. Then I got older and learned about freedom of speech and realized that I had no right to arrest people just because they scare me.
Then I got even older and started reading movie reviews. I became a film connoisseur and became sick of hearing about incredible classic horror movies, but not being able to watch them because they scared me. I forced myself to sit through Halloween, A Nightmare on Elm Street, and The Grudge, and soon I was able to enjoy horror movies like a normal person.
Not watching horror movies and punishing the people who made them were the preferences of young me. But my CEV turned out to be “Watch horror movies and reward the people who create them.” I don’t think this was random value drift, I think that I always had the potential to love horror movies and would have loved them sooner if I’d had the guts to sit down and watch them. The younger me didn’t have different terminal values, his values were just poorly extrapolated.
I think most of the types of people you mention would be the same if they could pierce through their cloud of self-deception. I think their values are wrong and that they themselves would recognize this if they weren’t irrational. I think a CEV would extrapolate this.
But even if I’m wrong, if there’s a Least Convenient Possible world where there are otherwise normal humans who have “kill all gays” irreversibly and directly programmed into their utility function, I don’t think a CEV of human morality would take that into account. I tend to think that, from an ethical standpoint, malicious preferences (that is, preferences where frustrating someone else’s desires is an end in itself, rather than a byproduct of competing for limited resources) deserve zero respect. I think that if a CEV took properly extrapolated human ethics it would realize this. It might not hurt to be extra careful about that when programming a CEV, however.
I don’t think this was random value drift, I think that I always had the potential to love horror movies and would have loved them sooner if I’d had the guts to sit down and watch them.
I had a somewhat similar experience growing up, although a few details are different (I never thought people should be banned from making such films or that they were evil things just because they scared me, for instance, and I made the decision to try watching some of them, mostly Alien and a few other works from the same general milieu, at a much younger age and for substantially different reasons). However, I didn’t wind up loving horror movies; I wound up liking one or two films that only pushed my buttons in nice, predictable places and without actually squicking me per se. I honestly still don’t get how someone can sit through films like Halloween or Friday the 13th—I mean, I get the narrative underpinnings and some of the psychological buttons they push very well (reminds me of ghost tales and other things from my youth), but I can’t actually feel the same way as your putative “normal person” when sitting through it. Even movies most people consider “very tame” or “not actually scary” make me too uncomfortable to want to sit through them, a good portion of the time. And I’ve actively tried to cultivate this, not for its own sake (I could go my whole life never sitting through such a film again and not be deprived, even one of the ones I’ve enjoyed many times) but because of the small but notable handful of horror-themed movies that I do like and the number of people I know who enjoy such films with whom I’d have even more social-yay if I did self-modify to enjoy those movies. It simply didn’t take—after much exposure and effort, I now find most such films both squicky and actively uninteresting. I can see why other people like ’em, but I can’t relate.
Are my terminal values “insufficiently extrapolated?” Or just not coherent with yours?
Are my terminal values “insufficiently extrapolated?” Or just not coherent with yours?
I don’t think it’s either. We both have the general value, “experience interesting stories,” it’s just expressed in slightly different ways. I don’t think that really really specific preferences for art consumption would be something that CEV extrapolates. I think CEV is meant to figure out what general things humans value, not really specific things (i.e. a CEV might say, “you want to experience fun adventure stories,” it would not say “read Green Lantern #26” or “read King Solomon’s Mines”). The impression I get is that CEV is more about general things like “How should we treat others?” and “How much effort should we devote to liking activities vs. approving ones?”
I don’t think our values are incoherent, you don’t want to stop me from watching horror movies and I don’t want to make you watch them. In fact, I think a CEV would probably say “It’s good to have many people who like different activities because that makes life more interesting and fun.” Some questions (like “Is it okay to torture people”) likely only have one true, or very few true, CEVs, but others, like matters of personal taste, probably vary from person to person. I think a FAI would probably order everyone not to torture toddlers, but I doubt it would order us all to watch “Animal House” at 9:00pm this coming Friday.
But even if I’m wrong, if there’s a Least Convenient Possible world where there are otherwise normal humans who have “kill all gays” irreversibly and directly programmed into their utility function, I don’t think a proper CEV would take that into account.
I’m not sure what to make of your use of the word “proper”. Are you predicting that a CEV will not be utilitarian or saying that you don’t want it to be?
I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call “malicious preferences.” That is, if someone valued frustrating someone else’s desires purely for their own sake, not because they needed the resources that person was using or something like that, the AI would not fulfill it.
This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it. My thinking on this was inspired by reading about Bryan Caplan’s debate with Robin Hanson, where Bryan mentioned:
...Robin endorses an endless list of bizarre moral claims. For example, he recently told me that “the main problem” with the Holocaust was that there weren’t enough Nazis! After all, if there had been six trillion Nazis willing to pay $1 each to make the Holocaust happen, and a mere six million Jews willing to pay $100,000 each to prevent it, the Holocaust would have generated $5.4 trillion worth of consumers surplus.
I don’t often agree with Bryan’s intuitionist approach to ethics, but I think he made a good point, satisfying the preferences of those trillion Nazis doesn’t seem like part of the meaning of right, and I think a CEV of human ethics would reflect this. I think that the preference of the six million Jews to live should be respected and the preferences of the six trillion Nazis be ignored.
I don’t think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that “malicious preferences” have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies.
In response to your question I have edited my post and changed “a proper CEV” to “a CEV of human morality.”
I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call “malicious preferences.”
This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it.
Zero is a strange number to have specified there, but then I don’t know the shape of the function you’re describing. I would have expected a non-specific “negative utility” in its place.
Zero is a strange number to have specified there, but then I don’t know the shape of the function you’re describing. I would have expected a non-specific “negative utility” in its place.
You’re probably right, I was typing fairly quickly last night.
I don’t think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that “malicious preferences” have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies.
Ah, okay. This sounds somewhat like Nozick’s “utilitarianism with side-constraints”. This position seems about as reasonable as the other major contenders for normative ethics, but some LessWrongers (pragmatist, Will_Sawin, etc...) consider it to be not even a kind of consequentialism.
David Friedman pointed out that this isn’t correct, it’s actually it’s quite easy to make positional values mutually satisfiable:
[Emphasis mine]
A FAI could simply make sure that everyone is a member of enough social groups that everyone has high status in some of them. Positional goals can be mutually satisficed, if one is smart enough about it. Those two types of value don’t differ as much as you seem to think they do. Positional goals just require a little more work to make implementing them conflict-free than the other type does.
I don’t think I agree with this. Couldn’t you take that argument further and claim that if I undergo some sort of rigorous self-improvement program in order to better achieve my goals in life, that that must mean I now have different values? In fact, you could easily say that I am behaving pointlessly because I’m not achieving my values better, I’m just changing them? It seems likely that most of the things that you are describing as values aren’t really values, they’re behaviors. I’d regard values as more “the direction in which you want to steer the world,” both in terms of your external environment and your emotional states. Behaviors are things you do, but they aren’t necessarily what you really prefer.
I agree that a more precise and articulate definition of these terms might be needed to create a FAI, especially if human preferences are part of a network of some sort as you claim, but I do think that they cleave reality at the joints.
I can’t really see how you can attack CEV by this route without also attacking any attempt at self-improvement by a person.
The fact that these values seem to change or weaken as people become wealthier and better educated indicates that they probably are poorly extrapolated values. Most of these people don’t really want to do these things, they just think they do because they lack the cognitive ability to see it. This is emphasized by the fact that these people, when called out on their behavior, often make up some consequentialist justification for it (if I don’t do it God will send an earthquake!)
I’ll use an example from my own personal experience to illustrate this, when I was little (around 2-5) I thought horror movies were evil because they scared me. I didn’t want to watch horror movies or even be in the same room with a horror movie poster. I thought people should be punished for making such scary things. Then I got older and learned about freedom of speech and realized that I had no right to arrest people just because they scare me.
Then I got even older and started reading movie reviews. I became a film connoisseur and became sick of hearing about incredible classic horror movies, but not being able to watch them because they scared me. I forced myself to sit through Halloween, A Nightmare on Elm Street, and The Grudge, and soon I was able to enjoy horror movies like a normal person.
Not watching horror movies and punishing the people who made them were the preferences of young me. But my CEV turned out to be “Watch horror movies and reward the people who create them.” I don’t think this was random value drift, I think that I always had the potential to love horror movies and would have loved them sooner if I’d had the guts to sit down and watch them. The younger me didn’t have different terminal values, his values were just poorly extrapolated.
I think most of the types of people you mention would be the same if they could pierce through their cloud of self-deception. I think their values are wrong and that they themselves would recognize this if they weren’t irrational. I think a CEV would extrapolate this.
But even if I’m wrong, if there’s a Least Convenient Possible world where there are otherwise normal humans who have “kill all gays” irreversibly and directly programmed into their utility function, I don’t think a CEV of human morality would take that into account. I tend to think that, from an ethical standpoint, malicious preferences (that is, preferences where frustrating someone else’s desires is an end in itself, rather than a byproduct of competing for limited resources) deserve zero respect. I think that if a CEV took properly extrapolated human ethics it would realize this. It might not hurt to be extra careful about that when programming a CEV, however.
I had a somewhat similar experience growing up, although a few details are different (I never thought people should be banned from making such films or that they were evil things just because they scared me, for instance, and I made the decision to try watching some of them, mostly Alien and a few other works from the same general milieu, at a much younger age and for substantially different reasons). However, I didn’t wind up loving horror movies; I wound up liking one or two films that only pushed my buttons in nice, predictable places and without actually squicking me per se. I honestly still don’t get how someone can sit through films like Halloween or Friday the 13th—I mean, I get the narrative underpinnings and some of the psychological buttons they push very well (reminds me of ghost tales and other things from my youth), but I can’t actually feel the same way as your putative “normal person” when sitting through it. Even movies most people consider “very tame” or “not actually scary” make me too uncomfortable to want to sit through them, a good portion of the time. And I’ve actively tried to cultivate this, not for its own sake (I could go my whole life never sitting through such a film again and not be deprived, even one of the ones I’ve enjoyed many times) but because of the small but notable handful of horror-themed movies that I do like and the number of people I know who enjoy such films with whom I’d have even more social-yay if I did self-modify to enjoy those movies. It simply didn’t take—after much exposure and effort, I now find most such films both squicky and actively uninteresting. I can see why other people like ’em, but I can’t relate.
Are my terminal values “insufficiently extrapolated?” Or just not coherent with yours?
I don’t think it’s either. We both have the general value, “experience interesting stories,” it’s just expressed in slightly different ways. I don’t think that really really specific preferences for art consumption would be something that CEV extrapolates. I think CEV is meant to figure out what general things humans value, not really specific things (i.e. a CEV might say, “you want to experience fun adventure stories,” it would not say “read Green Lantern #26” or “read King Solomon’s Mines”). The impression I get is that CEV is more about general things like “How should we treat others?” and “How much effort should we devote to liking activities vs. approving ones?”
I don’t think our values are incoherent, you don’t want to stop me from watching horror movies and I don’t want to make you watch them. In fact, I think a CEV would probably say “It’s good to have many people who like different activities because that makes life more interesting and fun.” Some questions (like “Is it okay to torture people”) likely only have one true, or very few true, CEVs, but others, like matters of personal taste, probably vary from person to person. I think a FAI would probably order everyone not to torture toddlers, but I doubt it would order us all to watch “Animal House” at 9:00pm this coming Friday.
I’m glad you pointed this out—I don’t think this view is common enough around here.
I’m not sure what to make of your use of the word “proper”. Are you predicting that a CEV will not be utilitarian or saying that you don’t want it to be?
I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call “malicious preferences.” That is, if someone valued frustrating someone else’s desires purely for their own sake, not because they needed the resources that person was using or something like that, the AI would not fulfill it.
This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it. My thinking on this was inspired by reading about Bryan Caplan’s debate with Robin Hanson, where Bryan mentioned:
I don’t often agree with Bryan’s intuitionist approach to ethics, but I think he made a good point, satisfying the preferences of those trillion Nazis doesn’t seem like part of the meaning of right, and I think a CEV of human ethics would reflect this. I think that the preference of the six million Jews to live should be respected and the preferences of the six trillion Nazis be ignored.
I don’t think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that “malicious preferences” have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies.
In response to your question I have edited my post and changed “a proper CEV” to “a CEV of human morality.”
Zero is a strange number to have specified there, but then I don’t know the shape of the function you’re describing. I would have expected a non-specific “negative utility” in its place.
You’re probably right, I was typing fairly quickly last night.
Ah, okay. This sounds somewhat like Nozick’s “utilitarianism with side-constraints”. This position seems about as reasonable as the other major contenders for normative ethics, but some LessWrongers (pragmatist, Will_Sawin, etc...) consider it to be not even a kind of consequentialism.