my CEV, my wife’s CEV, Biden’s CEV and Putin’s CEV are four quite different CEVs
It really depends on the extrapolation process—which features of your minds are used as input, and what the extrapolation algorithm does with them.
To begin with, there is a level of abstraction at which the minds of all four of you are the same, yet different from various nonhuman minds. If the extrapolation algorithm is “identify the standard cognitive architecture of this entity’s species, and build a utopia for that kind of mind”, then all the details which make you, your wife, and the two presidents different from each other, play no role in the process. On the other hand, if the extrapolation algorithm is “identify the current values of this specific mind, and build it a utopia”, probably you get different futures.
The original CEV proposal is intended more in the spirit of the first option—what gets extrapolated is not the contingencies of any particular human mind, but the cognitive universals that most humans share. Furthermore, the extrapolation algorithm itself is supposed to be derived from something cognitively universal to humans.
That’s why Eliezer sometimes used the physics metaphor of renormalization. One starts with the hypothesis that there some kind of universal human decision procedure (or decision template that gets filled out differently in different individuals), arising from our genes (and perhaps also from environment including enculturation). That is the actual “algorithm” that determines individual human choices.
Then, that algorithm having been extracted via AI (in principle it could be figured out by human neuroscientists working without AI, but it’s a bit late for that now), the algorithm is then improved according to criteria that come from some part of the algorithm itself. That’s the “renormalization” part: normative self-modification of the abstracted essence of the human decision procedure. The essence of human decision procedure, improved e.g. by application of its own metaethical criteria.
What I just described is the improvement of an individual human mind-template. On the other hand, CEV is supposed to provide the decision theory of a human-friendly AI, which will pertain more to a notion of common good, presumably aggregating individual needs and desires in some way. But again, it would be a notion of common good that arises from a “renormalized” standard-human concept of common good.
The way I’ve just explained “the original CEV proposal” is, to be sure, something of my own interpretation. I’ve provided a few details which aren’t in the original texts. But I believe I have preserved the basic ideas. For some reason, they were never developed very much. Maybe MIRI deemed it dangerous to discuss publicly—too close to the core of what alignment needs to get right—so it was safer to focus publicly on other aspects of the problem, like logical induction. (That is just my speculation, by the way. I have no evidence of that at all.) Maybe it was just too hard, with too many unknowns about how the human decision procedure actually works. Maybe, once the deep learning revolution was underway, the challenges of simpler kinds of alignment, and the Christiano paradigm that ended up being used at OpenAI, absorbed everyone’s attention; and the CEV ideal, of alignment sufficient to be the seed of a humane transhuman civilization, was put to one side. Or maybe I have just overlooked papers that do develop CEV? There are still a few people trying to figure out everything from first principles, rather than engaging in the iterative test-and-finetune approach that currently prevails.
(in principle it could be figured out by human neuroscientists working without AI, but it’s a bit late for that now)
What? Why? There is no AI as of now, LLMs definitely do not count. I think it is still quite possible that neuroscience will make its breakthrough on its own, without any non-human mind help (again, dressing up the final article doesn’t count, we’re talking about the general insights and analysis here).
To begin with, there is a level of abstraction at which the minds of all four of you are the same, yet different from various nonhuman minds.
I am actually not even sure about that. Your “identify the standard cognitive architecture of this entity’s species” presupposes existence thereof—in a sufficiently specified way to then build its utopia and to derive that identification correctly in all four cases.
But, more importantly, I would say that this algorithm does not derive my CEV in any useful sense.
It really depends on the extrapolation process—which features of your minds are used as input, and what the extrapolation algorithm does with them.
To begin with, there is a level of abstraction at which the minds of all four of you are the same, yet different from various nonhuman minds. If the extrapolation algorithm is “identify the standard cognitive architecture of this entity’s species, and build a utopia for that kind of mind”, then all the details which make you, your wife, and the two presidents different from each other, play no role in the process. On the other hand, if the extrapolation algorithm is “identify the current values of this specific mind, and build it a utopia”, probably you get different futures.
The original CEV proposal is intended more in the spirit of the first option—what gets extrapolated is not the contingencies of any particular human mind, but the cognitive universals that most humans share. Furthermore, the extrapolation algorithm itself is supposed to be derived from something cognitively universal to humans.
That’s why Eliezer sometimes used the physics metaphor of renormalization. One starts with the hypothesis that there some kind of universal human decision procedure (or decision template that gets filled out differently in different individuals), arising from our genes (and perhaps also from environment including enculturation). That is the actual “algorithm” that determines individual human choices.
Then, that algorithm having been extracted via AI (in principle it could be figured out by human neuroscientists working without AI, but it’s a bit late for that now), the algorithm is then improved according to criteria that come from some part of the algorithm itself. That’s the “renormalization” part: normative self-modification of the abstracted essence of the human decision procedure. The essence of human decision procedure, improved e.g. by application of its own metaethical criteria.
What I just described is the improvement of an individual human mind-template. On the other hand, CEV is supposed to provide the decision theory of a human-friendly AI, which will pertain more to a notion of common good, presumably aggregating individual needs and desires in some way. But again, it would be a notion of common good that arises from a “renormalized” standard-human concept of common good.
The way I’ve just explained “the original CEV proposal” is, to be sure, something of my own interpretation. I’ve provided a few details which aren’t in the original texts. But I believe I have preserved the basic ideas. For some reason, they were never developed very much. Maybe MIRI deemed it dangerous to discuss publicly—too close to the core of what alignment needs to get right—so it was safer to focus publicly on other aspects of the problem, like logical induction. (That is just my speculation, by the way. I have no evidence of that at all.) Maybe it was just too hard, with too many unknowns about how the human decision procedure actually works. Maybe, once the deep learning revolution was underway, the challenges of simpler kinds of alignment, and the Christiano paradigm that ended up being used at OpenAI, absorbed everyone’s attention; and the CEV ideal, of alignment sufficient to be the seed of a humane transhuman civilization, was put to one side. Or maybe I have just overlooked papers that do develop CEV? There are still a few people trying to figure out everything from first principles, rather than engaging in the iterative test-and-finetune approach that currently prevails.
Independently,
What? Why? There is no AI as of now, LLMs definitely do not count. I think it is still quite possible that neuroscience will make its breakthrough on its own, without any non-human mind help (again, dressing up the final article doesn’t count, we’re talking about the general insights and analysis here).
I am actually not even sure about that. Your “identify the standard cognitive architecture of this entity’s species” presupposes existence thereof—in a sufficiently specified way to then build its utopia and to derive that identification correctly in all four cases.
But, more importantly, I would say that this algorithm does not derive my CEV in any useful sense.