Roman values aren’t stable under reflection; the CEV of Rome doesn’t have the same values as ancient Rome. It’s like a 5-year-old locking in what they want to be when they grow up.
Locking in extrapolated Roman values sounds great to me because I don’t expect that to be significantly different than a broader extrapolation. Of course, this is all extremely handwavy and there are convergence issues of superhuman difficulty! :)
Roman values aren’t stable under reflection; the CEV of Rome doesn’t have the same values as ancient Rome.
I’m not exactly sure what you’re saying here, but if you’re saying that the fact of modern Roman values being different than Ancient Roman values shows that Ancient Roman values aren’t stable under reflection, then I totally disagree. History playing out is a not-at-all similar process to an individual person reflecting on their values, so the fact that Roman values changed as history played out from Ancient Rome to modern Rome does not imply that an individual Ancient Roman’s values are not stable under reflection.
As an example, Country A conquering Country B could lead the descendants of Country B’s population to have the values of Country A 100 years hence, but this information has nothing to do with whether a pre-conquest Country B citizen would come to have Country A’s values on reflection.
Locking in extrapolated Roman values sounds great to me because I don’t expect that to be significantly different than a broader extrapolation.
I guess I just have very different intuitions from you on this. I expect expect people from different historical time periods and cultures to have quite different extrapolated values. I think the concept that all peoples throughout history would come into near agreement about what is good if they just reflected on it long enough is unrealistic.
(unless, of course, we snuck a bit of motivated reasoning into the design of our Value Extrapolator so that it just happens to always output values similar to our 21st century Western liberal values...)
I think the concept that all peoples throughout history would come into near agreement about what is good if they just reflected on it long enough is unrealistic.
Yes. Exactly. You don’t even need to go through time, place and culture on modern-day Earth are sufficient. While I cannot know my CEV (for if I knew, I would be there already), I predict with high confidence that my CEV, my wife’s CEV, Biden’s CEV and Putin’s CEV are four quite different CEVs, even if they all include as a consequence “the planet existing as long as the CEV’s bearer and the beings the CEV’s bearer cares about are on it”.
my CEV, my wife’s CEV, Biden’s CEV and Putin’s CEV are four quite different CEVs
It really depends on the extrapolation process—which features of your minds are used as input, and what the extrapolation algorithm does with them.
To begin with, there is a level of abstraction at which the minds of all four of you are the same, yet different from various nonhuman minds. If the extrapolation algorithm is “identify the standard cognitive architecture of this entity’s species, and build a utopia for that kind of mind”, then all the details which make you, your wife, and the two presidents different from each other, play no role in the process. On the other hand, if the extrapolation algorithm is “identify the current values of this specific mind, and build it a utopia”, probably you get different futures.
The original CEV proposal is intended more in the spirit of the first option—what gets extrapolated is not the contingencies of any particular human mind, but the cognitive universals that most humans share. Furthermore, the extrapolation algorithm itself is supposed to be derived from something cognitively universal to humans.
That’s why Eliezer sometimes used the physics metaphor of renormalization. One starts with the hypothesis that there some kind of universal human decision procedure (or decision template that gets filled out differently in different individuals), arising from our genes (and perhaps also from environment including enculturation). That is the actual “algorithm” that determines individual human choices.
Then, that algorithm having been extracted via AI (in principle it could be figured out by human neuroscientists working without AI, but it’s a bit late for that now), the algorithm is then improved according to criteria that come from some part of the algorithm itself. That’s the “renormalization” part: normative self-modification of the abstracted essence of the human decision procedure. The essence of human decision procedure, improved e.g. by application of its own metaethical criteria.
What I just described is the improvement of an individual human mind-template. On the other hand, CEV is supposed to provide the decision theory of a human-friendly AI, which will pertain more to a notion of common good, presumably aggregating individual needs and desires in some way. But again, it would be a notion of common good that arises from a “renormalized” standard-human concept of common good.
The way I’ve just explained “the original CEV proposal” is, to be sure, something of my own interpretation. I’ve provided a few details which aren’t in the original texts. But I believe I have preserved the basic ideas. For some reason, they were never developed very much. Maybe MIRI deemed it dangerous to discuss publicly—too close to the core of what alignment needs to get right—so it was safer to focus publicly on other aspects of the problem, like logical induction. (That is just my speculation, by the way. I have no evidence of that at all.) Maybe it was just too hard, with too many unknowns about how the human decision procedure actually works. Maybe, once the deep learning revolution was underway, the challenges of simpler kinds of alignment, and the Christiano paradigm that ended up being used at OpenAI, absorbed everyone’s attention; and the CEV ideal, of alignment sufficient to be the seed of a humane transhuman civilization, was put to one side. Or maybe I have just overlooked papers that do develop CEV? There are still a few people trying to figure out everything from first principles, rather than engaging in the iterative test-and-finetune approach that currently prevails.
(in principle it could be figured out by human neuroscientists working without AI, but it’s a bit late for that now)
What? Why? There is no AI as of now, LLMs definitely do not count. I think it is still quite possible that neuroscience will make its breakthrough on its own, without any non-human mind help (again, dressing up the final article doesn’t count, we’re talking about the general insights and analysis here).
To begin with, there is a level of abstraction at which the minds of all four of you are the same, yet different from various nonhuman minds.
I am actually not even sure about that. Your “identify the standard cognitive architecture of this entity’s species” presupposes existence thereof—in a sufficiently specified way to then build its utopia and to derive that identification correctly in all four cases.
But, more importantly, I would say that this algorithm does not derive my CEV in any useful sense.
I meant I don’t think the CEV of ancient Rome has the same values as ancient Rome. Looks like your comment got truncated: “what is good if they were just”
Roman values aren’t stable under reflection; the CEV of Rome doesn’t have the same values as ancient Rome. It’s like a 5-year-old locking in what they want to be when they grow up.
Locking in extrapolated Roman values sounds great to me because I don’t expect that to be significantly different than a broader extrapolation. Of course, this is all extremely handwavy and there are convergence issues of superhuman difficulty! :)
I’m not exactly sure what you’re saying here, but if you’re saying that the fact of modern Roman values being different than Ancient Roman values shows that Ancient Roman values aren’t stable under reflection, then I totally disagree. History playing out is a not-at-all similar process to an individual person reflecting on their values, so the fact that Roman values changed as history played out from Ancient Rome to modern Rome does not imply that an individual Ancient Roman’s values are not stable under reflection.
As an example, Country A conquering Country B could lead the descendants of Country B’s population to have the values of Country A 100 years hence, but this information has nothing to do with whether a pre-conquest Country B citizen would come to have Country A’s values on reflection.
I guess I just have very different intuitions from you on this. I expect expect people from different historical time periods and cultures to have quite different extrapolated values. I think the concept that all peoples throughout history would come into near agreement about what is good if they just reflected on it long enough is unrealistic.
(unless, of course, we snuck a bit of motivated reasoning into the design of our Value Extrapolator so that it just happens to always output values similar to our 21st century Western liberal values...)
Yes. Exactly. You don’t even need to go through time, place and culture on modern-day Earth are sufficient. While I cannot know my CEV (for if I knew, I would be there already), I predict with high confidence that my CEV, my wife’s CEV, Biden’s CEV and Putin’s CEV are four quite different CEVs, even if they all include as a consequence “the planet existing as long as the CEV’s bearer and the beings the CEV’s bearer cares about are on it”.
It really depends on the extrapolation process—which features of your minds are used as input, and what the extrapolation algorithm does with them.
To begin with, there is a level of abstraction at which the minds of all four of you are the same, yet different from various nonhuman minds. If the extrapolation algorithm is “identify the standard cognitive architecture of this entity’s species, and build a utopia for that kind of mind”, then all the details which make you, your wife, and the two presidents different from each other, play no role in the process. On the other hand, if the extrapolation algorithm is “identify the current values of this specific mind, and build it a utopia”, probably you get different futures.
The original CEV proposal is intended more in the spirit of the first option—what gets extrapolated is not the contingencies of any particular human mind, but the cognitive universals that most humans share. Furthermore, the extrapolation algorithm itself is supposed to be derived from something cognitively universal to humans.
That’s why Eliezer sometimes used the physics metaphor of renormalization. One starts with the hypothesis that there some kind of universal human decision procedure (or decision template that gets filled out differently in different individuals), arising from our genes (and perhaps also from environment including enculturation). That is the actual “algorithm” that determines individual human choices.
Then, that algorithm having been extracted via AI (in principle it could be figured out by human neuroscientists working without AI, but it’s a bit late for that now), the algorithm is then improved according to criteria that come from some part of the algorithm itself. That’s the “renormalization” part: normative self-modification of the abstracted essence of the human decision procedure. The essence of human decision procedure, improved e.g. by application of its own metaethical criteria.
What I just described is the improvement of an individual human mind-template. On the other hand, CEV is supposed to provide the decision theory of a human-friendly AI, which will pertain more to a notion of common good, presumably aggregating individual needs and desires in some way. But again, it would be a notion of common good that arises from a “renormalized” standard-human concept of common good.
The way I’ve just explained “the original CEV proposal” is, to be sure, something of my own interpretation. I’ve provided a few details which aren’t in the original texts. But I believe I have preserved the basic ideas. For some reason, they were never developed very much. Maybe MIRI deemed it dangerous to discuss publicly—too close to the core of what alignment needs to get right—so it was safer to focus publicly on other aspects of the problem, like logical induction. (That is just my speculation, by the way. I have no evidence of that at all.) Maybe it was just too hard, with too many unknowns about how the human decision procedure actually works. Maybe, once the deep learning revolution was underway, the challenges of simpler kinds of alignment, and the Christiano paradigm that ended up being used at OpenAI, absorbed everyone’s attention; and the CEV ideal, of alignment sufficient to be the seed of a humane transhuman civilization, was put to one side. Or maybe I have just overlooked papers that do develop CEV? There are still a few people trying to figure out everything from first principles, rather than engaging in the iterative test-and-finetune approach that currently prevails.
Independently,
What? Why? There is no AI as of now, LLMs definitely do not count. I think it is still quite possible that neuroscience will make its breakthrough on its own, without any non-human mind help (again, dressing up the final article doesn’t count, we’re talking about the general insights and analysis here).
I am actually not even sure about that. Your “identify the standard cognitive architecture of this entity’s species” presupposes existence thereof—in a sufficiently specified way to then build its utopia and to derive that identification correctly in all four cases.
But, more importantly, I would say that this algorithm does not derive my CEV in any useful sense.
I meant I don’t think the CEV of ancient Rome has the same values as ancient Rome. Looks like your comment got truncated: “what is good if they were just”
Edited to fix.