Do we have any evidence for that, except saying that our modern values are “good” according to themselves, so whatever historical process led to them must have been “progress”?
Yes, we do. First, we have an understanding of the mechanisms processes that produced old and modern values, and many of the same mechanisms and processes used for “ought” questions are also used for “is” questions. Our ability to answer “is” questions accurately has improved dramatically, so we know the mechanisms have improved. Second, many of our values depend on factual underpinnings which used to be unknown or misunderstood. Our values have also improved on the measures of symmetry and internal consistency. Finally, we have identified causal mechanisms underpinning many old values, and found them repugnant.
How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition?
The first reason is predictability. Each person’s volition is a noisy, unstable and random thing. The CEV of all humanity is a better approximation of what my values will be in a kiloyear than my present volition is. The second reason is that people’s utility functions don’t exist in a vacuum; before making a major decision, people consult with other people and/or imagine their reactions, so you can’t separate one mind out from humanity without unpredictable consequences. Finally, it’s hard to make an AGI if the rest of humanity thinks you’re a supervillain, and anyone making an AGI based on a value system other than CEV most certainly is, so you’re better off being the sort of researcher who would incorporate all humanity’s values than the sort of researcher who wouldn’t.
Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of “intelligence”, and what other such things are there?
The more of the foundation we leave to non-human intelligences, the more likely it is to go wrong. If you fail to design an AGI to optimize CEV, it will optimize something else, and most of the things that could be are very bad.
Finally, it’s hard to make an AGI if the rest of humanity thinks you’re a supervillain, and anyone making an AGI based on a value system other than CEV most certainly is, so you’re better off being the sort of researcher who would incorporate all humanity’s values than the sort of researcher who wouldn’t.
If you’re openly making a fooming AGI, and if people think you have a realistic chance of success and treat you seriously, then I’m very sure that all major world governments, armies, etc. (including your own) as well as many corporations and individuals will treat you as a supervillain—and it won’t matter in the least what your goals might be, CEV or no.
Finally, we have identified causal mechanisms underpinning many old values, and found them repugnant.
This is exactly the kind of reasoning I mocked in the post.
No, you mocked finding the values themselves repugnant, not their underlying mechanisms. If we find out that a value only exists because of a historical accident plus status quo bias, and that any society where it wasn’t the status quo would reject it when it was explained to them, then we should reject that value.
All such desiderata get satisfied automatically if your comment was generated by your sincere volition and not something else :-)
The fact that my volition might just consist of a pointer to CEV does not seem like much of an argument for choosing it over CEV, given that my volition also includes lots of poorly-understood other stuff, which I won’t get a chance to inspect if there’s no extrapolation, and which is more likely to make things worse than to make them better. Also, consider the worst case scenario: I have a stroke shortly before the AI reads out my volition.
I think your arguments, if they worked, would prove way too much.
If we find out that a value only exists because of a historical accident plus status quo bias, and that any society where it wasn’t the status quo would reject it when it was explained to them, then we should reject that value.
This standard allows us to throw away all values not directly linked to inclusive genetic fitness, and maybe even those that are. There’s no objective morality.
The fact that my volition might just consist of a pointer to CEV does not seem like much of an argument for choosing it over CEV, given that my volition also includes lots of poorly-understood other stuff, which I won’t get a chance to inspect if there’s no extrapolation, and which is more likely to make things worse than to make them better.
This argument works just as well for defending concrete wishes (“volcano lair with catgirls”) over CEV.
If we find out that a value only exists because of a historical accident plus status quo bias, and that any society where it wasn’t the status quo would reject it when it was explained to them, then we should reject that value.
This standard allows us to throw away all values not directly linked to inclusive genetic fitness, and maybe even those that are. There’s no objective morality.
Huh? We must have a difference of definitions somewhere, because that’s not what I think my argument says at all.
The fact that my volition might just consist of a pointer to CEV does not seem like much of an argument for choosing it over CEV, given that my volition also includes lots of poorly-understood other stuff, which I won’t get a chance to inspect if there’s no extrapolation, and which is more likely to make things worse than to make them better.
This argument works just as well for defending concrete wishes (“volcano lair with catgirls”) over CEV.
No, it doesn’t. This was a counterargument to the could-be-a-pointer argument, not a root-level argument; and if you expand it out, it actually favors CEV over concrete wishes, not the reverse.
The could-be-a-pointer argument is that since one person’s volition might just be the desire to have CEV implemented, so that one person’s volition is at least as good as CEV. But this is wrong, because that person’s volition will also include lots of other stuff, which is substantially random and so at least some of it will be bad. So you need to filter (extrapolate) those desires to get only the good ones. One way we could filter them is by throwing out everything except for a few concrete wishes, but that is not the best possible filter because it will throw out many aspects of volition that are good (and probably also necessary for preventing disastrous misinterpretations of the concrete wishes).
If we find out that a value only exists because of a historical accident plus status quo bias, and that any society where it wasn’t the status quo would reject it when it was explained to them, then we should reject that value.
How confident are you that what’s left of our values, under that rule, would be enough to be called a volition at all?
Finally, we have identified causal mechanisms underpinning many old values, and found them repugnant.
This does not mean that people from the old societies which had those values would also find them repugnant if they understood these causal mechanisms. Understanding isn’t the problem. Values are often top-level goals and to that extent arbitrary.
For instance, many people raised to believe in God #1 have values of worshipping him. They understand that the reason they feel that is because they were taught it as children. They understand that if they, counterfactually, were exchanged as newborns and grew up in a different society, they would worship God #2 instead. This does not cause them to hold God #1′s values any less strongly.
My reading of society is that such understanding does move values, at least if the person starts in a universalist religion, like Christianity. But such understanding is extremely rare.
I would rephrase that as “such understanding moves values extremely rarely”. I think it’s not very rare for the understanding to exist but be compartmentalized.
Finally, it’s hard to make an AGI if the rest of humanity thinks you’re a supervillain, and anyone making an AGI based on a value system other than CEV most certainly is, so you’re better off being the sort of researcher who would incorporate all humanity’s values than the sort of researcher who wouldn’t.
Good point. But that can only work if your research is transparent. Otherwise, why would one believe you are not just signaling this attitude while secretly pursuing your selfish goals? That is the reason why governments get the complete source code of software products from companies like Microsoft.
In the context of machine intelligence, I reckon that means open-source software.
I figure, if you try and keep your source code secret, only fools will trust you. More to the point, competing organisations—who are more willing to actually share their results—are likely to gain mindshare, snowball, and succeed first.
Of course, it doesn’t always work like that. There’s a lot of secret sauce out there—especially server-side. However, for ethical coders, this seems like a no-brainer to me.
Yes, we do. First, we have an understanding of the mechanisms processes that produced old and modern values, and many of the same mechanisms and processes used for “ought” questions are also used for “is” questions. Our ability to answer “is” questions accurately has improved dramatically, so we know the mechanisms have improved. Second, many of our values depend on factual underpinnings which used to be unknown or misunderstood. Our values have also improved on the measures of symmetry and internal consistency. Finally, we have identified causal mechanisms underpinning many old values, and found them repugnant.
The first reason is predictability. Each person’s volition is a noisy, unstable and random thing. The CEV of all humanity is a better approximation of what my values will be in a kiloyear than my present volition is. The second reason is that people’s utility functions don’t exist in a vacuum; before making a major decision, people consult with other people and/or imagine their reactions, so you can’t separate one mind out from humanity without unpredictable consequences. Finally, it’s hard to make an AGI if the rest of humanity thinks you’re a supervillain, and anyone making an AGI based on a value system other than CEV most certainly is, so you’re better off being the sort of researcher who would incorporate all humanity’s values than the sort of researcher who wouldn’t.
The more of the foundation we leave to non-human intelligences, the more likely it is to go wrong. If you fail to design an AGI to optimize CEV, it will optimize something else, and most of the things that could be are very bad.
If you’re openly making a fooming AGI, and if people think you have a realistic chance of success and treat you seriously, then I’m very sure that all major world governments, armies, etc. (including your own) as well as many corporations and individuals will treat you as a supervillain—and it won’t matter in the least what your goals might be, CEV or no.
I think you didn’t really engage with the questions.
This is exactly the kind of reasoning I mocked in the post.
All such desiderata get satisfied automatically if your comment was generated by your sincere volition and not something else :-)
No, you mocked finding the values themselves repugnant, not their underlying mechanisms. If we find out that a value only exists because of a historical accident plus status quo bias, and that any society where it wasn’t the status quo would reject it when it was explained to them, then we should reject that value.
The fact that my volition might just consist of a pointer to CEV does not seem like much of an argument for choosing it over CEV, given that my volition also includes lots of poorly-understood other stuff, which I won’t get a chance to inspect if there’s no extrapolation, and which is more likely to make things worse than to make them better. Also, consider the worst case scenario: I have a stroke shortly before the AI reads out my volition.
I think your arguments, if they worked, would prove way too much.
This standard allows us to throw away all values not directly linked to inclusive genetic fitness, and maybe even those that are. There’s no objective morality.
This argument works just as well for defending concrete wishes (“volcano lair with catgirls”) over CEV.
Huh? We must have a difference of definitions somewhere, because that’s not what I think my argument says at all.
No, it doesn’t. This was a counterargument to the could-be-a-pointer argument, not a root-level argument; and if you expand it out, it actually favors CEV over concrete wishes, not the reverse.
The could-be-a-pointer argument is that since one person’s volition might just be the desire to have CEV implemented, so that one person’s volition is at least as good as CEV. But this is wrong, because that person’s volition will also include lots of other stuff, which is substantially random and so at least some of it will be bad. So you need to filter (extrapolate) those desires to get only the good ones. One way we could filter them is by throwing out everything except for a few concrete wishes, but that is not the best possible filter because it will throw out many aspects of volition that are good (and probably also necessary for preventing disastrous misinterpretations of the concrete wishes).
How confident are you that what’s left of our values, under that rule, would be enough to be called a volition at all?
This does not mean that people from the old societies which had those values would also find them repugnant if they understood these causal mechanisms. Understanding isn’t the problem. Values are often top-level goals and to that extent arbitrary.
For instance, many people raised to believe in God #1 have values of worshipping him. They understand that the reason they feel that is because they were taught it as children. They understand that if they, counterfactually, were exchanged as newborns and grew up in a different society, they would worship God #2 instead. This does not cause them to hold God #1′s values any less strongly.
My reading of society is that such understanding does move values, at least if the person starts in a universalist religion, like Christianity. But such understanding is extremely rare.
I would rephrase that as “such understanding moves values extremely rarely”. I think it’s not very rare for the understanding to exist but be compartmentalized.
Good point. But that can only work if your research is transparent. Otherwise, why would one believe you are not just signaling this attitude while secretly pursuing your selfish goals? That is the reason why governments get the complete source code of software products from companies like Microsoft.
In the context of machine intelligence, I reckon that means open-source software.
I figure, if you try and keep your source code secret, only fools will trust you. More to the point, competing organisations—who are more willing to actually share their results—are likely to gain mindshare, snowball, and succeed first.
Of course, it doesn’t always work like that. There’s a lot of secret sauce out there—especially server-side. However, for ethical coders, this seems like a no-brainer to me.
Are you claiming that it is intrinsically unethical to have closed-source code?
No. Keeping secrets is not normally considered to be “unethical”—but it is a different goal from trying to do something good.