He hasn’t taken a position on CEV, as far as I can tell.
I ( and probably many others) would be really interested in the opinion of other “famous lesswrongers” such as Yvain, Alicorn, Kaj Sotala, or you, Wei Dai.
I’m curious enough about this to look up the answers for you, but next time try “Google”.
Yvain: Coherent extrapolated volition utilitarianism is especially interesting; it says that instead of using actual preferences, we should use ideal preferences—what your preferences would be if you were smarter and had achieved more reflective equilibrium—and that instead of having to calculate each person’s preference individually, we should abstract them into an ideal set of preferences for all human beings. This would be an optimal moral system if it were possible, but the philosophical and computational challenges are immense.
Kaj: Some informal proposals for defining Friendliness do exist. The one that currently seems most promising is called Coherent Extrapolated Volition. In the CEV proposal, an AI will be built (or, to be exact, a proto-AI will be built to program another) to extrapolate what the ultimate desires of all the humans in the world would be if those humans knew everything a superintelligent being could potentially know; could think faster and smarter; were more like they wanted to be (more altruistic, more hard-working, whatever your ideal self is); would have lived with other humans for a longer time; had mainly those parts of themselves taken into account that they wanted to be taken into account. The ultimate desire—the volition—of everyone is extrapolated, with the AI then beginning to direct humanity towards a future where everyone’s volitions are fulfilled in the best manner possible. The desirability of the different futures is weighted by the strength of humanity’s desire—a smaller group of people with a very intense desire to see something happen may “overrule” a larger group who’d slightly prefer the opposite alternative but doesn’t really care all that much either way. Humanity is not instantly “upgraded” to the ideal state, but instead gradually directed towards it.
CEV avoids the problem of its programmers having to define the wanted values exactly, as it draws them directly out of the minds of people. Likewise it avoids the problem of confusing ends with means, as it’ll explictly model society’s development and the development of different desires as well. Everybody who thinks their favorite political model happens to objectively be the best in the world for everyone should be happy to implement CEV—if it really turns out that it is the best one in the world, CEV will end up implementing it. (Likewise, if it is the best for humanity that an AI stays mostly out of its affairs, that will happen as well.) A perfect implementation of CEV is unbiased in the sense that it will produce the same kind of world regardless of who builds it, and regardless of what their ideology happens to be—assuming the builders are intelligent enough to avoid including their own empirical beliefs (aside for the bare minimum required for the mind to function) into the model, and trust that if they are correct, the AI will figure them out on its own.
Alicorn: But I’m very dubious about CEV as a solution to fragility of value, and I think there are far more and deeper differences in human moral beliefs and human preferences than any monolithic solution can address. That doesn’t mean we can’t drastically improve things, though—or at least wind up with something that I like!
He hasn’t taken a position on CEV, as far as I can tell.
I’m curious enough about this to look up the answers for you, but next time try “Google”.
Yvain: Coherent extrapolated volition utilitarianism is especially interesting; it says that instead of using actual preferences, we should use ideal preferences—what your preferences would be if you were smarter and had achieved more reflective equilibrium—and that instead of having to calculate each person’s preference individually, we should abstract them into an ideal set of preferences for all human beings. This would be an optimal moral system if it were possible, but the philosophical and computational challenges are immense.
Kaj: Some informal proposals for defining Friendliness do exist. The one that currently seems most promising is called Coherent Extrapolated Volition. In the CEV proposal, an AI will be built (or, to be exact, a proto-AI will be built to program another) to extrapolate what the ultimate desires of all the humans in the world would be if those humans knew everything a superintelligent being could potentially know; could think faster and smarter; were more like they wanted to be (more altruistic, more hard-working, whatever your ideal self is); would have lived with other humans for a longer time; had mainly those parts of themselves taken into account that they wanted to be taken into account. The ultimate desire—the volition—of everyone is extrapolated, with the AI then beginning to direct humanity towards a future where everyone’s volitions are fulfilled in the best manner possible. The desirability of the different futures is weighted by the strength of humanity’s desire—a smaller group of people with a very intense desire to see something happen may “overrule” a larger group who’d slightly prefer the opposite alternative but doesn’t really care all that much either way. Humanity is not instantly “upgraded” to the ideal state, but instead gradually directed towards it.
CEV avoids the problem of its programmers having to define the wanted values exactly, as it draws them directly out of the minds of people. Likewise it avoids the problem of confusing ends with means, as it’ll explictly model society’s development and the development of different desires as well. Everybody who thinks their favorite political model happens to objectively be the best in the world for everyone should be happy to implement CEV—if it really turns out that it is the best one in the world, CEV will end up implementing it. (Likewise, if it is the best for humanity that an AI stays mostly out of its affairs, that will happen as well.) A perfect implementation of CEV is unbiased in the sense that it will produce the same kind of world regardless of who builds it, and regardless of what their ideology happens to be—assuming the builders are intelligent enough to avoid including their own empirical beliefs (aside for the bare minimum required for the mind to function) into the model, and trust that if they are correct, the AI will figure them out on its own.
Alicorn: But I’m very dubious about CEV as a solution to fragility of value, and I think there are far more and deeper differences in human moral beliefs and human preferences than any monolithic solution can address. That doesn’t mean we can’t drastically improve things, though—or at least wind up with something that I like!
See also Criticisms of CEV (request for links).
Thanks, this is awesome!
I’m sorry....