If you have citations for EY articulating the idea that writing superior nonhuman values would take too long to do, rather than that it’s fundamentally incoherent, I’d be interested. This would completely change my understanding of the whole Metaethics Sequence.
Whole brain emulation would basically be “copying” human values in a machine, and would demonstrate that “writing” human values is possible. You could then edit a couple morally relevant bits, and you’d be demonstrating that you could “create” a human-like but slightly edited morality. Evaluating whether it is “superior” by some metric would be a whole additional exercise, though.
I don’t think the metaethics sequence implies that writing down values is impossible, just that human values are very complex and messy.
Sure, if we drop the idea of “superior,” I agree completely that it’s possible (in principle) to write a set of values, and that the metaethics sequence does not imply otherwise.
And, also, it implies—well, it asserts—that human values are very complex and messy, as you say.
IIRC, it also asserts that human values are right. Which is why I think that on EY’s view, evaluating whether the “edited morality” you describe here is superior to human values is not just an additional exercise, but an unnecessary (and perhaps incoherent) one. On his view, I think we can know a priori that it isn’t.
Actually, now that I think about it more… when you say “there’s no source of complex value in the world besides humans”, do you mean to suggest that aliens with equally complex incompatible values simply can’t exist, or that if they did exist EY’s conclusions would change in some way to account for them?
I believe that EY definitively rejected the idea of there being an objective morality back in 2003 or thereabouts. Unless I am forgetting something from the metaethics sequence.
The whole point of CEV is to create a “superior” morality, though I think that too value-loaded of a word to use; the better word is “extrapolated”. The whole idea of Friendly AI is to create a moral agent that continues to progress. So I’m not sure why you’re claiming that EY is claiming that the notion of moral self-evaluation in AI is unnecessary. Isn’t comparing possible, “better” moralities to the current morality essential to the definition of “moral progress” and therefore indispensable to building a Friendly AI?
To respond to your last statement, no to both. Of course aliens with equally complex incompatible values can exist, and I’m sure they do in some faraway place. Those aliens don’t live here, though, so I’m not sure why we’d want to build a Friendly AI for their values rather than our own. The idea of building a Friendly AI is to ensure some kind of “metamoral continuity” through the intelligence explosion.
To some extent, I think we may be talking past each other when I talk about values and you reply about moralities.
To clarify: would you say that this process you refer to of creating a different “morality” (whether it’s different by virtue of being superior or extrapolated or something else is beside my point right now) keeps values fixed, or not?
I think it depends on what is meant by “values”. I would say that the values change while the fundamental motivations are fixed, though Vladimir’s response makes me unsure about this. Another way of saying it is that supergoals are fixed but the “Friendliness content” changes. (Though I haven’t seen the phrase “Friendliness content” around much lately, perhaps it’s being discarded in favor of more formal terms.)
Maybe another useful distinction would be between Friendliness structure and content (see the CFAI entry on the wiki).
I have to admit, the proliferation of terms in this discussion is making me less and less clear that I understand what was being said when you corrected me initially, despite several attempts to clarify it. So I’m going to suggest that we roll back and try this again, keeping our working vocabulary as well-defined as we can.
As I understand EY’s account:
He endorses building an optimization process (that is, a process that acts to maximize the amount of some specified target) that uses as its target the set of human terminal values (that is, the things that we want for their own sake, rather than wanting because we believe they’ll get us something else).
He also endorses building this process in such a way that it will improve itself as required so as to be able to exert superhuman optimizing power towards its target. The term “Friendly AI” refers to processes of this sort—that is, self-improving superhuman optimization processes that use as their target the set of human terminal values.
He also endorses a particular process (building a seed AI that analyzes humans) as a way of identifying the set of human terminal values. The term “CEV” (or, sometimes, “CEV(humanity)”) refers to the output of such an analysis.
He endorses all of this not only as pragmatic for our purposes, but also as the morally right thing to do. Even if there’s an equally complex species out there whose terminal values differ from ours, on EY’s account the morally right thing to do is optimize the universe for our terminal values rather than for theirs or for some compromise between the two. Members of that species might believe that humans are wrong to do so, but if so they’ll be mistaken.
I understand that you believe I’m mistaken about some or all of the above. I’m really not clear at this point on what you think is mistaken, or what you think is true instead.
Can you edit the above to reflect where you think it’s mistaken?
The only part I disagree with strongly is the language of the last point. Referring to CEV as “THE morally right thing to do” makes it seem as if it were set in stone as the guaranteed best path to creating FAI, which it isn’t. EY argues that building Friendly AI instead of just letting the chips fall where they may is the morally right thing to do, and I’d agree with that, but not that CEV specifically is the right thing to do.
One general goal point for FAI is to target outcomes “at least as good” as those which would be caused by benevolent human mind upload(s). So, the kind of “moral development” that a community of uploads would undergo should be encapsulated within a FAI. In fact, any beneficial area of the moral state space that would be accessible starting from humans or any combination of humans and tools should be accessible by a good FAI design. CEV is one such proposal towards such a design.
As I understand it, yes, the thinking is to optimize for our terminal values instead of this hypothetical alien species or some compromise of the two. However, if values among different intelligent species converge given greater intelligence, knowledge, and self-reflection, then we would expect our FAI to have goals that converge with the alien FAI. If values do not converge, then we would suppose our FAI to have different values than alien FAIs.
A “terminal value” might include carefully thinking through philosophical questions such as this and designing the best goal content possible given these considerations. So, if there are hypothetical alien values that seem “correct” (or simply sufficiently desirable from the subjective perspective) to extrapolated humanity, these values would be integrated into the CEV-output.
I agree that EY does not assert that his proposed process for defining FAI’s optimization target (that is, seed AI calculating CEV) is necessarily the best path to FAI, nor that that proposed process is particularly right. Correction accepted.
And yes, I agree that on EY’s account, given an alien species whose values converge with ours, a system that optimizes for our terminal values also optimizes for theirs.
Isn’t comparing possible, “better” moralities to the current morality essential to the definition of “moral progress” and therefore indispensable to building a Friendly AI?
FAI’s goals should be fixed, unchanging (by initial design). I see three possible things related to a FAI that could be described as involving a “changing morality”. First, it’s possible that the definition of FAI’s unchanging goals could take the form where it makes sense to talk about some process of change in provisional goals, but this process of change would be a part of the definition of the unchanging result. For something like CEV, we might say that CEV is the first stage that takes care of collecting initial data from humans, tries to “extrapolate” goals from this data, decides on whether it can formulate FAI’s goals, and if successful runs a FAI with these (fixed) goals.
Second, the world managed by FAI might contain agents with changing morality, if the FAI decides that agents with changing morality are the right thing to create or maintain, according to FAI’s fixed morality.
And third, FAI itself might take significant time in understanding the logical implications of the fixed definition of its morality, either in general or as applied to particular (hypothetical) situations. Even mathematics with elementary axioms that human mathematicians do is quite complicated. Useful parts of the mathematics of human value might take billions of years to figure out.
Hm.
If you have citations for EY articulating the idea that writing superior nonhuman values would take too long to do, rather than that it’s fundamentally incoherent, I’d be interested. This would completely change my understanding of the whole Metaethics Sequence.
Whole brain emulation would basically be “copying” human values in a machine, and would demonstrate that “writing” human values is possible. You could then edit a couple morally relevant bits, and you’d be demonstrating that you could “create” a human-like but slightly edited morality. Evaluating whether it is “superior” by some metric would be a whole additional exercise, though.
I don’t think the metaethics sequence implies that writing down values is impossible, just that human values are very complex and messy.
Sure, if we drop the idea of “superior,” I agree completely that it’s possible (in principle) to write a set of values, and that the metaethics sequence does not imply otherwise.
And, also, it implies—well, it asserts—that human values are very complex and messy, as you say.
IIRC, it also asserts that human values are right. Which is why I think that on EY’s view, evaluating whether the “edited morality” you describe here is superior to human values is not just an additional exercise, but an unnecessary (and perhaps incoherent) one. On his view, I think we can know a priori that it isn’t.
Actually, now that I think about it more… when you say “there’s no source of complex value in the world besides humans”, do you mean to suggest that aliens with equally complex incompatible values simply can’t exist, or that if they did exist EY’s conclusions would change in some way to account for them?
I believe that EY definitively rejected the idea of there being an objective morality back in 2003 or thereabouts. Unless I am forgetting something from the metaethics sequence.
The whole point of CEV is to create a “superior” morality, though I think that too value-loaded of a word to use; the better word is “extrapolated”. The whole idea of Friendly AI is to create a moral agent that continues to progress. So I’m not sure why you’re claiming that EY is claiming that the notion of moral self-evaluation in AI is unnecessary. Isn’t comparing possible, “better” moralities to the current morality essential to the definition of “moral progress” and therefore indispensable to building a Friendly AI?
To respond to your last statement, no to both. Of course aliens with equally complex incompatible values can exist, and I’m sure they do in some faraway place. Those aliens don’t live here, though, so I’m not sure why we’d want to build a Friendly AI for their values rather than our own. The idea of building a Friendly AI is to ensure some kind of “metamoral continuity” through the intelligence explosion.
To some extent, I think we may be talking past each other when I talk about values and you reply about moralities.
To clarify: would you say that this process you refer to of creating a different “morality” (whether it’s different by virtue of being superior or extrapolated or something else is beside my point right now) keeps values fixed, or not?
I think it depends on what is meant by “values”. I would say that the values change while the fundamental motivations are fixed, though Vladimir’s response makes me unsure about this. Another way of saying it is that supergoals are fixed but the “Friendliness content” changes. (Though I haven’t seen the phrase “Friendliness content” around much lately, perhaps it’s being discarded in favor of more formal terms.)
Maybe another useful distinction would be between Friendliness structure and content (see the CFAI entry on the wiki).
I have to admit, the proliferation of terms in this discussion is making me less and less clear that I understand what was being said when you corrected me initially, despite several attempts to clarify it. So I’m going to suggest that we roll back and try this again, keeping our working vocabulary as well-defined as we can.
As I understand EY’s account:
He endorses building an optimization process (that is, a process that acts to maximize the amount of some specified target) that uses as its target the set of human terminal values (that is, the things that we want for their own sake, rather than wanting because we believe they’ll get us something else).
He also endorses building this process in such a way that it will improve itself as required so as to be able to exert superhuman optimizing power towards its target. The term “Friendly AI” refers to processes of this sort—that is, self-improving superhuman optimization processes that use as their target the set of human terminal values.
He also endorses a particular process (building a seed AI that analyzes humans) as a way of identifying the set of human terminal values. The term “CEV” (or, sometimes, “CEV(humanity)”) refers to the output of such an analysis.
He endorses all of this not only as pragmatic for our purposes, but also as the morally right thing to do. Even if there’s an equally complex species out there whose terminal values differ from ours, on EY’s account the morally right thing to do is optimize the universe for our terminal values rather than for theirs or for some compromise between the two. Members of that species might believe that humans are wrong to do so, but if so they’ll be mistaken.
I understand that you believe I’m mistaken about some or all of the above.
I’m really not clear at this point on what you think is mistaken, or what you think is true instead.
Can you edit the above to reflect where you think it’s mistaken?
The only part I disagree with strongly is the language of the last point. Referring to CEV as “THE morally right thing to do” makes it seem as if it were set in stone as the guaranteed best path to creating FAI, which it isn’t. EY argues that building Friendly AI instead of just letting the chips fall where they may is the morally right thing to do, and I’d agree with that, but not that CEV specifically is the right thing to do.
One general goal point for FAI is to target outcomes “at least as good” as those which would be caused by benevolent human mind upload(s). So, the kind of “moral development” that a community of uploads would undergo should be encapsulated within a FAI. In fact, any beneficial area of the moral state space that would be accessible starting from humans or any combination of humans and tools should be accessible by a good FAI design. CEV is one such proposal towards such a design.
As I understand it, yes, the thinking is to optimize for our terminal values instead of this hypothetical alien species or some compromise of the two. However, if values among different intelligent species converge given greater intelligence, knowledge, and self-reflection, then we would expect our FAI to have goals that converge with the alien FAI. If values do not converge, then we would suppose our FAI to have different values than alien FAIs.
A “terminal value” might include carefully thinking through philosophical questions such as this and designing the best goal content possible given these considerations. So, if there are hypothetical alien values that seem “correct” (or simply sufficiently desirable from the subjective perspective) to extrapolated humanity, these values would be integrated into the CEV-output.
I agree that EY does not assert that his proposed process for defining FAI’s optimization target (that is, seed AI calculating CEV) is necessarily the best path to FAI, nor that that proposed process is particularly right. Correction accepted.
And yes, I agree that on EY’s account, given an alien species whose values converge with ours, a system that optimizes for our terminal values also optimizes for theirs.
Thanks.
FAI’s goals should be fixed, unchanging (by initial design). I see three possible things related to a FAI that could be described as involving a “changing morality”. First, it’s possible that the definition of FAI’s unchanging goals could take the form where it makes sense to talk about some process of change in provisional goals, but this process of change would be a part of the definition of the unchanging result. For something like CEV, we might say that CEV is the first stage that takes care of collecting initial data from humans, tries to “extrapolate” goals from this data, decides on whether it can formulate FAI’s goals, and if successful runs a FAI with these (fixed) goals.
Second, the world managed by FAI might contain agents with changing morality, if the FAI decides that agents with changing morality are the right thing to create or maintain, according to FAI’s fixed morality.
And third, FAI itself might take significant time in understanding the logical implications of the fixed definition of its morality, either in general or as applied to particular (hypothetical) situations. Even mathematics with elementary axioms that human mathematicians do is quite complicated. Useful parts of the mathematics of human value might take billions of years to figure out.