Alternatively, we could bite the bullet and just say that some humans simply end up with alien values that are not “good”,
Seeing as about 1% of the population are estimated to be psychopaths, not to mention pathological narcissists megalomaniacs etc, it seems hard to argue that there isn’t a large (if statistically insignificant) portion of the population who are natural ethical egoists rather than altruists. You could try to weasel around it like Mr Yudkowski does, saying that they are not “neurologically intact,” except that there is evidence that psychopathy at least is a stable evolutionary strategy rather than a malfunction of normal systems.
I’m usually not one to play the “evil psychopaths” card online, mainly because it’s crass and diminishes the meaning of a useful medical term, but it’s pretty applicable here. What exactly happens to all the psychopaths and people with psychopathic traits when you start extrapolating human values?
Why even stop at psychopaths? There are perfectly neurotypical people with strong desires for revenge-based justice, purity norms that I strongly dislike, etc. I’m not extremely confident that extrapolation will dissolve these values into deeper-order values, although my perception that intelligence in humans does at least seem to be correlated to values similar to mine is comforting in this respect.
Although really, I think this is reaching the point where we have to stop talking in terms of idealized agents with values and start thinking about how these models can be mapped to actual meat brains.
What exactly happens to all the psychopaths and people with psychopathic traits when you start extrapolating human values?
Well, under the shaky assumption that we have the ability to extrapolate in the first place, in practice what happens is that whoever controls the extrapolation sets which values are to be extrapolated, and they have a very strong incentive to put in only their own values.
By definition, no one wants to implement the CEV of humanity more than they want to implement their own CEV. But I would hope that most of the worlds impacted by the various human’s CEVs would be a pretty nice places to live.
By definition, no one wants to implement the CEV of humanity more than they want to implement their own CEV.
That depends. The more interconnected our lives become, the harder it gets to enhance the life of myself or my loved ones through highly localized improvements. Once you get up to a sufficiently high level (vaccination programs are an obvious example), helping yourself and your loved ones is easiest to accomplish by helping everyone all together, because of the ripple effects down to my loved ones’ loved ones thus having an effect on my loved ones, whom I value unto themselves.
Favoring individual volition versus a group volition could be a matter of social-graph connectedness and weighting: it could be that for a sufficiently connected individual with sufficiently strong value-weight placed on social ties, that individual will feel better about sacrificing some personal preferences to admit their connections’ values rather than simply subjecting their own close social connections to their personal volition.
But as far as your preference goes, your EV >= any other CEV. It has to be that way, tautologically. Extrapolated Volition is defined as what you would choose to do in the counter-factual scenario where you have more intelligence, knowledge, etc than you do now.
If you’re totally altruistic, it might be that your EV is the CEV of humanity, but that means that you have no preference, not that you prefer humanity’s CEV over your own. Remember, all your preferences, including the moral and altruistic ones, are included in your EV.
The notion I’m trying to express is not an entirely altruistic EV, or even a deliberately altruistic EV. Simply, this person has friends and family and such, and thus has a partially social EV; this person is at least altruistic towards close associates when it costs them nothing.
My claim, then, is that if we denote the n = number of hops from any one person to any other in the social graph of such agents:
lim_{n->0} Social Component of Personal EV = species-wide CEV
Now, there may be special cases, such as people who don’t give a shit about anyone but themselves, but the idea is that as social connectedness grows, benefitting only myself and my loved ones becomes more and more expensive and unwieldly (for instance, income inequality and guard labor already have sizable, well-studied economic costs, and that’s before we’re talking about potential improvements to the human condition from AI!) compared to just doing things that are good for everyone without regard to people’s connection to myself (they’re bound to connect through a mutual friend or relative with some low degree, after all) or social status (because again, status enforcement is expensive).
So while the total degree to which I care about other people is limited (Social Component of Personal EV ⇐ Personal EV), eventually that component should approximate the CEV of everyone reachable from me in the social graph.
The question, then, becomes whether that Social Component of my Personal EV is large enough to overwhelm some of my own personal preferences (I participate in a broader society voluntarily) or whether my personal values overwhelm my consideration of other people’s feelings (I conquer the world and crush you beneath my feet).
Seems to me that to a significant degree the psychopaths are successful because people around them have problems communicating. Information about what the specific psychopath did to whom are usually not shared. If they were easily accessible to people before interacting with the psychopath, a lot of their power would be lost.
Despite being introverted by nature, these days my heuristics for dealing with problematic people is to establish good communication lines among the non-problematic people. Then people often realize that what seemed like their specific problem is in fact almost everyone’s problem with the same person, following the same pattern. When a former mystery becomes an obvious algorithm, it is easier to think about a counter-strategy.
Sometimes the mentally different person beats you not by using a strategy so complex you wouldn’t understand it, but by using a relatively simple strategy that is so weird to you that you just don’t notice it in the hypothesis space (and instead you imagine something more complex and powerful). But once you have enough data to understand the strategy, sometimes you can find and exploit its flaws.
A specific example of a powerful yet vulnerable strategy is lying strategically to everyone around you and establishing yourself as the only channel of information between different groups of people. Then you can make the group A believe the group B are idiots and vice versa, and make both groups see you as their secret ally. Your strategy can be stable for a long time, because when the groups believe each other to be idiots, they naturally avoid communicating with each other; and when they do, they realize the other side has completely wrong information, which they attribute to the other side’s stupidity, not your strategic lying. -- Yet, if there is a person at each side that becomes suspicious of the manipulator, and if these two people can trust each other enough to meet and share their info (what each of them heard about the other side, and what actually happened), and if they make the result known to their respective groups, then… well, I don’t actually know what happens, because right now I am exactly at this point in my specific undisclosed project… but I hope it can seriously backfire to the manipulator.
Of course, this is just a speculation. If we made communication among non-psychopaths more easy, the psychopaths would also make their next move in the arms race—they could misuse the channels for more powerful attacks, or make people provide incorrect information about them by manipulation or threats. So it’s not obvious that better communication would mean less power for psychopaths. But it seems to me that a lack of communication is always helpful for them, so more communication should generally be helpful. Even having the concept of a psychopath is helpful, although it can be abused. Investigating the specific weaknesses of psychopaths and making them widely known (just like the weaknesses of average people are generally known) could also reduce their advantage.
However, I imagine that the values of psychopaths are not so different from values of average people. They are probably a subset, and the missing parts (such as empathy) are those that cause problems. Let’s say they give extreme priority to feeling superior and watching their enemies crushed and pretty much ignore everything else (a huge simplification). There is a chance their values are so different they could be satisfied in a manner we would consider unfriendly, but they wouldn’t—for example if reality is not valuable for them, why not give them an illusion of maximum superiority, and a happy life to everyone else, so everyone will have their utility function maximized? Maybe they would agree with this solution even if they had perfect intelligence and knowledge.
Seeing as about 1% of the population are estimated to be psychopaths, not to mention pathological narcissists megalomaniacs etc, it seems hard to argue that there isn’t a large (if statistically insignificant) portion of the population who are natural ethical egoists rather than altruists. You could try to weasel around it like Mr Yudkowski does, saying that they are not “neurologically intact,” except that there is evidence that psychopathy at least is a stable evolutionary strategy rather than a malfunction of normal systems.
I’m usually not one to play the “evil psychopaths” card online, mainly because it’s crass and diminishes the meaning of a useful medical term, but it’s pretty applicable here. What exactly happens to all the psychopaths and people with psychopathic traits when you start extrapolating human values?
Why even stop at psychopaths? There are perfectly neurotypical people with strong desires for revenge-based justice, purity norms that I strongly dislike, etc. I’m not extremely confident that extrapolation will dissolve these values into deeper-order values, although my perception that intelligence in humans does at least seem to be correlated to values similar to mine is comforting in this respect.
Although really, I think this is reaching the point where we have to stop talking in terms of idealized agents with values and start thinking about how these models can be mapped to actual meat brains.
Well, under the shaky assumption that we have the ability to extrapolate in the first place, in practice what happens is that whoever controls the extrapolation sets which values are to be extrapolated, and they have a very strong incentive to put in only their own values.
By definition, no one wants to implement the CEV of humanity more than they want to implement their own CEV. But I would hope that most of the worlds impacted by the various human’s CEVs would be a pretty nice places to live.
That depends. The more interconnected our lives become, the harder it gets to enhance the life of myself or my loved ones through highly localized improvements. Once you get up to a sufficiently high level (vaccination programs are an obvious example), helping yourself and your loved ones is easiest to accomplish by helping everyone all together, because of the ripple effects down to my loved ones’ loved ones thus having an effect on my loved ones, whom I value unto themselves.
Favoring individual volition versus a group volition could be a matter of social-graph connectedness and weighting: it could be that for a sufficiently connected individual with sufficiently strong value-weight placed on social ties, that individual will feel better about sacrificing some personal preferences to admit their connections’ values rather than simply subjecting their own close social connections to their personal volition.
Then they have an altruistic EV. That’s allowed.
But as far as your preference goes, your EV >= any other CEV. It has to be that way, tautologically. Extrapolated Volition is defined as what you would choose to do in the counter-factual scenario where you have more intelligence, knowledge, etc than you do now.
If you’re totally altruistic, it might be that your EV is the CEV of humanity, but that means that you have no preference, not that you prefer humanity’s CEV over your own. Remember, all your preferences, including the moral and altruistic ones, are included in your EV.
Sorry, I don’t think I’m being clear.
The notion I’m trying to express is not an entirely altruistic EV, or even a deliberately altruistic EV. Simply, this person has friends and family and such, and thus has a partially social EV; this person is at least altruistic towards close associates when it costs them nothing.
My claim, then, is that if we denote the n = number of hops from any one person to any other in the social graph of such agents:
lim_{n->0} Social Component of Personal EV = species-wide CEV
Now, there may be special cases, such as people who don’t give a shit about anyone but themselves, but the idea is that as social connectedness grows, benefitting only myself and my loved ones becomes more and more expensive and unwieldly (for instance, income inequality and guard labor already have sizable, well-studied economic costs, and that’s before we’re talking about potential improvements to the human condition from AI!) compared to just doing things that are good for everyone without regard to people’s connection to myself (they’re bound to connect through a mutual friend or relative with some low degree, after all) or social status (because again, status enforcement is expensive).
So while the total degree to which I care about other people is limited (Social Component of Personal EV ⇐ Personal EV), eventually that component should approximate the CEV of everyone reachable from me in the social graph.
The question, then, becomes whether that Social Component of my Personal EV is large enough to overwhelm some of my own personal preferences (I participate in a broader society voluntarily) or whether my personal values overwhelm my consideration of other people’s feelings (I conquer the world and crush you beneath my feet).
Seems to me that to a significant degree the psychopaths are successful because people around them have problems communicating. Information about what the specific psychopath did to whom are usually not shared. If they were easily accessible to people before interacting with the psychopath, a lot of their power would be lost.
Despite being introverted by nature, these days my heuristics for dealing with problematic people is to establish good communication lines among the non-problematic people. Then people often realize that what seemed like their specific problem is in fact almost everyone’s problem with the same person, following the same pattern. When a former mystery becomes an obvious algorithm, it is easier to think about a counter-strategy.
Sometimes the mentally different person beats you not by using a strategy so complex you wouldn’t understand it, but by using a relatively simple strategy that is so weird to you that you just don’t notice it in the hypothesis space (and instead you imagine something more complex and powerful). But once you have enough data to understand the strategy, sometimes you can find and exploit its flaws.
A specific example of a powerful yet vulnerable strategy is lying strategically to everyone around you and establishing yourself as the only channel of information between different groups of people. Then you can make the group A believe the group B are idiots and vice versa, and make both groups see you as their secret ally. Your strategy can be stable for a long time, because when the groups believe each other to be idiots, they naturally avoid communicating with each other; and when they do, they realize the other side has completely wrong information, which they attribute to the other side’s stupidity, not your strategic lying. -- Yet, if there is a person at each side that becomes suspicious of the manipulator, and if these two people can trust each other enough to meet and share their info (what each of them heard about the other side, and what actually happened), and if they make the result known to their respective groups, then… well, I don’t actually know what happens, because right now I am exactly at this point in my specific undisclosed project… but I hope it can seriously backfire to the manipulator.
Of course, this is just a speculation. If we made communication among non-psychopaths more easy, the psychopaths would also make their next move in the arms race—they could misuse the channels for more powerful attacks, or make people provide incorrect information about them by manipulation or threats. So it’s not obvious that better communication would mean less power for psychopaths. But it seems to me that a lack of communication is always helpful for them, so more communication should generally be helpful. Even having the concept of a psychopath is helpful, although it can be abused. Investigating the specific weaknesses of psychopaths and making them widely known (just like the weaknesses of average people are generally known) could also reduce their advantage.
However, I imagine that the values of psychopaths are not so different from values of average people. They are probably a subset, and the missing parts (such as empathy) are those that cause problems. Let’s say they give extreme priority to feeling superior and watching their enemies crushed and pretty much ignore everything else (a huge simplification). There is a chance their values are so different they could be satisfied in a manner we would consider unfriendly, but they wouldn’t—for example if reality is not valuable for them, why not give them an illusion of maximum superiority, and a happy life to everyone else, so everyone will have their utility function maximized? Maybe they would agree with this solution even if they had perfect intelligence and knowledge.
The wirehead solution applies to a lot more than psychopaths. Why would you consider it unfriendly?