Sure, we favor the particular Should Function that is, today, instantiated in the brains of roughly middle-of-the-range-politically intelligent westerners.
Do you think there is no simple procedure that would find roughly the same “should function” hidden somewhere in the brain of a brain-washed blood-thirsty religious zealot? It doesn’t need to be what the person believes, what the person would recognize as valuable, etc., just something extractable from the person, according to a criterion that might be very alien to their conscious mind. Not all opinions (beliefs/likes) are equal, and I wouldn’t want to get stuck with wrong optimization-criterion just because I happened to be born in the wrong place and didn’t (yet!) get the chance to learn more about the world.
(I’m avoiding the term ‘preference’ to remove connotations I expect it to have for you, for what I consider the wrong reasons.)
A lot of people seem to want to have their cake and eat it with CEV. Haidt has shown us that human morality is universal in form and local in content, and has gone on to do case studies showing that there are 5 basic human moral dimensions (harm/care, justice/fairness, loyalty/ingroup, respect/authority, purity/sacredness), and our culture only has the first two.
It seems that there is no way you can run an honestly moraly neutral CEV of all of humanity and expect to reliably get something you want. You can either rig CEV so that it tweaks people who don’t share our moral drives, or you can just cross your fingers and hope that the process of extrapolation causes convergence to our idealized preferences, and if you’re wrong you’ll find yourself in a future that is suboptimal.
On one hand, using preference-aggregation is supposed to give you the outcome preferred by you to a lesser extent than if you just started from yourself. On the other hand, CEV is not “morally neutral”. (Or at least, the extent to which preference is given in CEV implicitly has nothing to do with preference-aggregation.)
We have a tradeoff between the number of people to include in preference-aggregation and value-to-you of the outcome. So, this is a situation to use the reversal test. If you consider only including the smart sane westerners as preferable to including all presently alive folks, then you need to have a good argument why you won’t want to exclude some of the smart sane westerners as well, up to a point of only leaving yourself.
Yes, a CEV of only yourself is, by definition optimal.
The reason I don’t recommend you try it is because it is infeasible; probability of success is very low, and by including a bunch of people who (you have good reason to think) are a lot like you, you will eventually reach the optimal point in the tradeoff between quality of outcome and probability of success.
I hope you realize that you are in flat disagreement with Eliezer about this. He explicitly affirmed that running CEV on himself alone, if he had the chance to do it, would be wrong.
Eliezer quite possibly does believe that. That he can make that claim with some credibility is one of the reasons I am less inclined to use my resources to thwart Eliezer’s plans for future light cone domination.
Nevertheless, Roko is right more or less by definition and I lend my own flat disagreement to his.
“Low probability of success” should of course include game-theoretic considerations where people are more willing to help you if you give more weight to their preference (and should refuse to help you if you give them too little, even if it’s much more than status quo, as in Ultimatum game). As a rule, in Ultimatum game you should give away more if you lose from giving it away less. When you lose value to other people in exchange to their help, having compatible preferences doesn’t necessarily significantly alleviate this loss.
having compatible preferences doesn’t necessarily significantly alleviate this loss.
I know about the ultimatum game, but it is game-theoretically interesting precisely because the players have different preferences: I want all the money for me, you want all of it for you.
I know about the ultimatum game, but it is game-theoretically interesting precisely because the players have different preferences: I want all the money for me, you want all of it for you.
Ultimatum game was mentioned primarily to remind that the amount of FAI-value traded for assistance may be orders of magnitude greater than what the assistance feels to amount to.
We might as well have as a given that all the discussed values are (at least to some small extent) different. The “all of money” here are the points of disagreement, mutually exclusive features of the future. But you are not trading value for value. You are trading value-after-FAI for assistance-now.
If two people compete for providing you an equivalent amount of assistance, you should be indifferent between them in accepting this assistance, which means that it should cost you an equivalent amount of value. If Person A has preference close to yours, and Person B has preference distant from yours, then by losing the same amount of value, you can help Person A more than Person B. Thus, if we assume egalitarian “background assistance”, provided implicitly by e.g. not revolting and stopping the FAI programmer, then everyone still can get a slice of the pie, no matter how distant their values. If nothing else, the more alien people should strive to help you more, so that you’ll be willing to part with more value for them (marginal value of providing assistance is greater for distant-preference folks).
FAI-value traded for assistance may be orders of magnitude greater than what the assistance feels to amount to.
Another way to put this is that when people negotiate, they do best, all other things equal, if they try to drive a very hard bargain. If me and my neighbour Claire are both from roughly the same culture, upbringing, etc, and we are together going to build an AI which will extrapolate a combination of our volitions, Claire might do well to demand a 99% weighting to her volitions, and maybe I’ll bargain her up to 90% or something.
Bob the babyeater might offer me the same help that Claire could have given in exchange for just a 1% weighting of his volition, by the principle that I am making the same sacrifice in giving 99% of the CEV to Claire as in giving 1% to Bob.
In reality, however, humans tend to live and work with people that are like them, rather than people who are unlike them. And the world we live in doesn’t have a uniform distribution of power and knowledge across cultures.
If nothing else, the more alien people should strive to help you more, so that you’ll be willing to part with more value for them
Many “alien” cultures are too powerless compared to ours to do anything. The However, China and India are potential exceptions. The USA and China may end up in a dictator game over FAI motivations.
All I am saying is that the egalitarian desire to include all of humanity in CEV, each with equal weight, is not optimal. Yes dictator game/negotiation with China, yes dictator game/negotiation within US/EU/western block.
Excluding a group from the CEV doesn’t mean disenfranchising them. It means enfranchising them according to your definition of enfranchisement. Cultures in North Africa that genitally mutilate women should not be included in CEV, but I predict that my CEV would treat their culture with respect and dignity, including in some cases interfering to prevent them from using their share of the light-cone to commit extreme acts of torture or oppression.
You don’t include cultures in CEV, you filter people through extrapolation of their volition. Even if culture makes value different, “mutilating women” is not a kind of thing that gets through, and so is a broken prototype example for drawing attention to.
In any case, my argument in the above comment was that value should be given (theoretically, if everyone understands the deal and relevant game theory, etc., etc.; realistically, such a deal must be simplified; you may even get away with cheating) according to provided assistance, not according to compatibility of value. If poor compatibility of value prevents from giving assistance, this is an effect of value completely unrelated to post-FAI compatibility, and given that assistance can be given with money, the effect itself doesn’t seem real either. You may well exclude people of Myanmar, because they are poor and can’t affect your success, but not people of a generous/demanding genocidal cult, for an irrelevant reason that they are evil. Game theory is cynical.
how do you know? If enough people want it strongly enough, it might.
How strongly people want something now doesn’t matter, reflection has the power to wipe current consensus clean. You are not cooking a mixture of wants, you are letting them fight it out, and a losing want doesn’t have to leave any residue. Only to the extent current wants might indicate extrapolated wants, should we take current wants into account.
You are not cooking a mixture of wants, you are letting them fight it out, and a losing want doesn’t have to leave any residue.
Sure. And tolerance, gender equality, multiculturalism, personal freedoms, etc might lose in such a battle. An extrapolation that is more nonlinear in its inputs cuts both ways.
you think there is no simple procedure that would find roughly the same “should function” hidden somewhere in the brain of a brain-washed blood-thirsty religious zealot?
Sure, the kolmogorov complexity of a set of edits to change the moral reflective equilibrium of a human is probably pretty low compared to the complexity of the overall human preference set. But that works the other way around too. Somewhere hidden in the brain of a a liberal western person is a murderer/terrorist/child abuser/fundamentalist if you just perform the right set of edits.
But that works the other way around too. Somewhere hidden in the brain of a a liberal western person is a murderer/terrorist/child abuser/fundamentalist if you just perform the right set of edits.
Again, not all beliefs are equal. You don’t want to use the procedure that’ll find a murderer in yourself, you want to use the procedure that’ll find a nice fellow in a murderer. And given such a procedure, you won’t need to exclude murderers from extrapolated volition.
Do you think there is no simple procedure that would find roughly the same “should function” hidden somewhere in the brain of a brain-washed blood-thirsty religious zealot? It doesn’t need to be what the person believes, what the person would recognize as valuable, etc., just something extractable from the person, according to a criterion that might be very alien to their conscious mind. Not all opinions (beliefs/likes) are equal, and I wouldn’t want to get stuck with wrong optimization-criterion just because I happened to be born in the wrong place and didn’t (yet!) get the chance to learn more about the world.
(I’m avoiding the term ‘preference’ to remove connotations I expect it to have for you, for what I consider the wrong reasons.)
A lot of people seem to want to have their cake and eat it with CEV. Haidt has shown us that human morality is universal in form and local in content, and has gone on to do case studies showing that there are 5 basic human moral dimensions (harm/care, justice/fairness, loyalty/ingroup, respect/authority, purity/sacredness), and our culture only has the first two.
It seems that there is no way you can run an honestly moraly neutral CEV of all of humanity and expect to reliably get something you want. You can either rig CEV so that it tweaks people who don’t share our moral drives, or you can just cross your fingers and hope that the process of extrapolation causes convergence to our idealized preferences, and if you’re wrong you’ll find yourself in a future that is suboptimal.
Haidt just claims that the relative balance of those five clusters differ across cultures, they’re present in all.
On one hand, using preference-aggregation is supposed to give you the outcome preferred by you to a lesser extent than if you just started from yourself. On the other hand, CEV is not “morally neutral”. (Or at least, the extent to which preference is given in CEV implicitly has nothing to do with preference-aggregation.)
We have a tradeoff between the number of people to include in preference-aggregation and value-to-you of the outcome. So, this is a situation to use the reversal test. If you consider only including the smart sane westerners as preferable to including all presently alive folks, then you need to have a good argument why you won’t want to exclude some of the smart sane westerners as well, up to a point of only leaving yourself.
Yes, a CEV of only yourself is, by definition optimal.
The reason I don’t recommend you try it is because it is infeasible; probability of success is very low, and by including a bunch of people who (you have good reason to think) are a lot like you, you will eventually reach the optimal point in the tradeoff between quality of outcome and probability of success.
I hope you realize that you are in flat disagreement with Eliezer about this. He explicitly affirmed that running CEV on himself alone, if he had the chance to do it, would be wrong.
Confirmed.
Eliezer quite possibly does believe that. That he can make that claim with some credibility is one of the reasons I am less inclined to use my resources to thwart Eliezer’s plans for future light cone domination.
Nevertheless, Roko is right more or less by definition and I lend my own flat disagreement to his.
“Low probability of success” should of course include game-theoretic considerations where people are more willing to help you if you give more weight to their preference (and should refuse to help you if you give them too little, even if it’s much more than status quo, as in Ultimatum game). As a rule, in Ultimatum game you should give away more if you lose from giving it away less. When you lose value to other people in exchange to their help, having compatible preferences doesn’t necessarily significantly alleviate this loss.
Sorry, I don’t follow this: can you restate?
I know about the ultimatum game, but it is game-theoretically interesting precisely because the players have different preferences: I want all the money for me, you want all of it for you.
Ultimatum game was mentioned primarily to remind that the amount of FAI-value traded for assistance may be orders of magnitude greater than what the assistance feels to amount to.
We might as well have as a given that all the discussed values are (at least to some small extent) different. The “all of money” here are the points of disagreement, mutually exclusive features of the future. But you are not trading value for value. You are trading value-after-FAI for assistance-now.
If two people compete for providing you an equivalent amount of assistance, you should be indifferent between them in accepting this assistance, which means that it should cost you an equivalent amount of value. If Person A has preference close to yours, and Person B has preference distant from yours, then by losing the same amount of value, you can help Person A more than Person B. Thus, if we assume egalitarian “background assistance”, provided implicitly by e.g. not revolting and stopping the FAI programmer, then everyone still can get a slice of the pie, no matter how distant their values. If nothing else, the more alien people should strive to help you more, so that you’ll be willing to part with more value for them (marginal value of providing assistance is greater for distant-preference folks).
Thanks for the explanation.
Another way to put this is that when people negotiate, they do best, all other things equal, if they try to drive a very hard bargain. If me and my neighbour Claire are both from roughly the same culture, upbringing, etc, and we are together going to build an AI which will extrapolate a combination of our volitions, Claire might do well to demand a 99% weighting to her volitions, and maybe I’ll bargain her up to 90% or something.
Bob the babyeater might offer me the same help that Claire could have given in exchange for just a 1% weighting of his volition, by the principle that I am making the same sacrifice in giving 99% of the CEV to Claire as in giving 1% to Bob.
In reality, however, humans tend to live and work with people that are like them, rather than people who are unlike them. And the world we live in doesn’t have a uniform distribution of power and knowledge across cultures.
Many “alien” cultures are too powerless compared to ours to do anything. The However, China and India are potential exceptions. The USA and China may end up in a dictator game over FAI motivations.
All I am saying is that the egalitarian desire to include all of humanity in CEV, each with equal weight, is not optimal. Yes dictator game/negotiation with China, yes dictator game/negotiation within US/EU/western block.
Excluding a group from the CEV doesn’t mean disenfranchising them. It means enfranchising them according to your definition of enfranchisement. Cultures in North Africa that genitally mutilate women should not be included in CEV, but I predict that my CEV would treat their culture with respect and dignity, including in some cases interfering to prevent them from using their share of the light-cone to commit extreme acts of torture or oppression.
You don’t include cultures in CEV, you filter people through extrapolation of their volition. Even if culture makes value different, “mutilating women” is not a kind of thing that gets through, and so is a broken prototype example for drawing attention to.
In any case, my argument in the above comment was that value should be given (theoretically, if everyone understands the deal and relevant game theory, etc., etc.; realistically, such a deal must be simplified; you may even get away with cheating) according to provided assistance, not according to compatibility of value. If poor compatibility of value prevents from giving assistance, this is an effect of value completely unrelated to post-FAI compatibility, and given that assistance can be given with money, the effect itself doesn’t seem real either. You may well exclude people of Myanmar, because they are poor and can’t affect your success, but not people of a generous/demanding genocidal cult, for an irrelevant reason that they are evil. Game theory is cynical.
how do you know? If enough people want it strongly enough, it might.
How strongly people want something now doesn’t matter, reflection has the power to wipe current consensus clean. You are not cooking a mixture of wants, you are letting them fight it out, and a losing want doesn’t have to leave any residue. Only to the extent current wants might indicate extrapolated wants, should we take current wants into account.
Sure. And tolerance, gender equality, multiculturalism, personal freedoms, etc might lose in such a battle. An extrapolation that is more nonlinear in its inputs cuts both ways.
Might “mutilating men” make it through?
(sorry for the euphemism, I mean male circumcision)
Sure, the kolmogorov complexity of a set of edits to change the moral reflective equilibrium of a human is probably pretty low compared to the complexity of the overall human preference set. But that works the other way around too. Somewhere hidden in the brain of a a liberal western person is a murderer/terrorist/child abuser/fundamentalist if you just perform the right set of edits.
Again, not all beliefs are equal. You don’t want to use the procedure that’ll find a murderer in yourself, you want to use the procedure that’ll find a nice fellow in a murderer. And given such a procedure, you won’t need to exclude murderers from extrapolated volition.