I would also like to see this discussion. It isn’t terribly clear to me why the extinction of the human race and its replacement with some non-human AI is an inherently bad outcome. Why keep around and devote resources to human beings, who at best can be seen as sort of a prototype of true intelligence, since that’s not really what they’re designed for?
While imagining our extinction at the hands of our robot overlords seems unpleasant, if you imagine a gradual cyborg evolution to a post-human world, that seems scary, but not morally objectionable. Besides the Ship of Theseus, what’s the difference?
A long time ago, a different person who also happens to be named “Eliezer Yudkowsky” said that, in the event of a clash between human beings and superintelligent AIs, he would side with the latter. The Yudkowsky we all know rejects this position, though it is not clear to me why.
A long time ago, a different person who also happens to be named “Eliezer Yudkowsky” said that, in the event of a clash between human beings and superintelligent AIs, he would side with the latter. The Yudkowsky we all know rejects this position, though it is not clear to me why.
Not clear why? Because he likes people and doesn’t want everyone he knows (including himself), everyone he doesn’t know and any potential descendants of either to die? Doesn’t that sound like a default position? Most people don’t want themselves to go extinct.
“Superintelligent AIs” is not one thing, it’s a class of quadrillions of different possible things. The old Eliezer was probably thinking of one thing when he referred to superintelligences. When you realize that SAIs are a category of beings with more potential diversity than all species that have ever lived, it’s hard to side with them all as a group. You’d have to have poor aesthetics to value them all equally.
Thanks for the clarification. My understanding is that (the current) Eliezer doesn’t merely claim that we shouldn’t value all superintelligent AIs equally; he makes the much stronger claim that, in a conflict between humans and AIs, we should side with the former regardless of what kind of AI is actually involved in this conflict. This stronger claim seems much harder to defend precisely in light of the fact that the space of possible AIs is so vast. Surely there must be some AIs in this heterogenous group whose survival is preferable to that of creatures like us?
I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard. E.g. here:
Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.
“Well,” says the one, “maybe according to your provincial human values, you wouldn’t like it. But I can easily imagine a galactic civilization full of agents who are nothing like you, yet find great value and interest in their own goals. And that’s fine by me. I’m not so bigoted as you are. Let the Future go its own way, without trying to bind it forever to the laughably primitive prejudices of a pack of four-limbed Squishy Things—”
My friend, I have no problem with the thought of a galactic civilization vastly unlike our own… full of strange beings who look nothing like me even in their own imaginations… pursuing pleasures and experiences I can’t begin to empathize with… trading in a marketplace of unimaginable goods… allying to pursue incomprehensible objectives… people whose life-stories I could never understand.
That’s what the Future looks like if things go right.
If the chain of inheritance from human (meta)morals is broken, the Future does not look like this. It does not end up magically, delightfully incomprehensible.
With very high probability, it ends up looking dull. Pointless. Something whose loss you wouldn’t mourn.
That’s helpful. I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive. If this is so, I think it’s misleading to use the locution ‘friendly AI’ to designate such artificial agents, and am inclined to believe that many folks who are sympathetic to the goal of creating friendly AI wouldn’t be if they knew what was actually meant by that expression.
That’s not the proper definition… Friendly AI, according to current guesses/theory, would be an extrapolation of human values. The extrapolation part is everything. I encourage you to check out that linked document, the system it defines (though just a rough sketch) is what is usually meant by “Friendly AI” around here. No one is arguing that “human values” = “what we absolutely must pursue”. I’m not sure that creating Friendly AI, a machine that helps us, should be considered as passing a moral judgment on mankind or the world. At least, it seems like a really informal way of looking at it, and probably unhelpful as it’s imbued with so much moral valence.
[Eliezer] makes the much stronger claim that, in a conflict between humans and AIs, we should side with the former regardless of what kind of AI is actually involved in this conflict.
Kaj replied:
I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard.
I then said:
I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive.
But now you reply:
Friendly AI is defined as “human-benefiting, non-human harming”.
It would clearly be wishful thinking to assume that the countless forms of AIs that “could be genuinely better than us in every regard” would all act in friendly ways towards humans, given that acting in other ways could potentially realize other goals that this superior beings might have.
That doesn’t sound quite right either, given Eliezer’s unusually strong anti-death preferences. (Nor do I think most other SI folks would endorse it; I wouldn’t.)
ETA: Friendly AI was also explicitly defined as “human-benefiting” in e.g. Creating Friendly AI:
The term “Friendly AI” refers to the production of human-benefiting, non-humanharming
actions in Artificial Intelligence systems that have advanced to the point of
making real-world plans in pursuit of goals.
Even though Eliezer has declared CFAI as outdated, I don’t think that particular bit is.
As I understand Eliezer’s current position, it is that the right thing to optimize the universe for is the set of things humans collectively value (aka “CEV(humanity)”).
On this account the space of all possible optimizing systems (aka “AIs” or “AGIs”) can be divided into two sets: those which optimize for CEV(humanity) (aka “Friendly AIs”), and those which optimize for something else (aka “Unfriendly AIs”).
And Friendly AIs are the right thing to “side with”, as you put it here, because CEV(humanity) is on this account the right thing to optimize for.
On this account, “why side with Friendly AI over Unfriendly?” is roughly equivalent to asking “why do the right thing?”
The survival of creatures like us is entirely beside the point. Maybe CEV(humanity) includes the survival of creatures like us and maybe it doesn’t.
Now, you might ask, why is CEV(humanity) the right thing to optimize the universe for, as opposed to something else? To which I think Eliezer’s reply is that this is simply what it means to be right; things are right insofar as they correspond to what humans collectively value.
Some people (myself among them) find this an unconvincing argument. That said, I don’t think anyone has made a convincing argument that some specific other thing is better to optimize for, either.
To which I think Eliezer’s reply is that this is simply what it means to be right; things are right insofar as they correspond to what humans collectively value.
No. The argument is more like that there’s no source of complex value in the world besides humans, and writing complex values line by line would take thousands of years, so we are forced to use some combination and/or extrapolation of human values, whether we want to or not.
If you have citations for EY articulating the idea that writing superior nonhuman values would take too long to do, rather than that it’s fundamentally incoherent, I’d be interested. This would completely change my understanding of the whole Metaethics Sequence.
Whole brain emulation would basically be “copying” human values in a machine, and would demonstrate that “writing” human values is possible. You could then edit a couple morally relevant bits, and you’d be demonstrating that you could “create” a human-like but slightly edited morality. Evaluating whether it is “superior” by some metric would be a whole additional exercise, though.
I don’t think the metaethics sequence implies that writing down values is impossible, just that human values are very complex and messy.
Sure, if we drop the idea of “superior,” I agree completely that it’s possible (in principle) to write a set of values, and that the metaethics sequence does not imply otherwise.
And, also, it implies—well, it asserts—that human values are very complex and messy, as you say.
IIRC, it also asserts that human values are right. Which is why I think that on EY’s view, evaluating whether the “edited morality” you describe here is superior to human values is not just an additional exercise, but an unnecessary (and perhaps incoherent) one. On his view, I think we can know a priori that it isn’t.
Actually, now that I think about it more… when you say “there’s no source of complex value in the world besides humans”, do you mean to suggest that aliens with equally complex incompatible values simply can’t exist, or that if they did exist EY’s conclusions would change in some way to account for them?
I believe that EY definitively rejected the idea of there being an objective morality back in 2003 or thereabouts. Unless I am forgetting something from the metaethics sequence.
The whole point of CEV is to create a “superior” morality, though I think that too value-loaded of a word to use; the better word is “extrapolated”. The whole idea of Friendly AI is to create a moral agent that continues to progress. So I’m not sure why you’re claiming that EY is claiming that the notion of moral self-evaluation in AI is unnecessary. Isn’t comparing possible, “better” moralities to the current morality essential to the definition of “moral progress” and therefore indispensable to building a Friendly AI?
To respond to your last statement, no to both. Of course aliens with equally complex incompatible values can exist, and I’m sure they do in some faraway place. Those aliens don’t live here, though, so I’m not sure why we’d want to build a Friendly AI for their values rather than our own. The idea of building a Friendly AI is to ensure some kind of “metamoral continuity” through the intelligence explosion.
To some extent, I think we may be talking past each other when I talk about values and you reply about moralities.
To clarify: would you say that this process you refer to of creating a different “morality” (whether it’s different by virtue of being superior or extrapolated or something else is beside my point right now) keeps values fixed, or not?
I think it depends on what is meant by “values”. I would say that the values change while the fundamental motivations are fixed, though Vladimir’s response makes me unsure about this. Another way of saying it is that supergoals are fixed but the “Friendliness content” changes. (Though I haven’t seen the phrase “Friendliness content” around much lately, perhaps it’s being discarded in favor of more formal terms.)
Maybe another useful distinction would be between Friendliness structure and content (see the CFAI entry on the wiki).
I have to admit, the proliferation of terms in this discussion is making me less and less clear that I understand what was being said when you corrected me initially, despite several attempts to clarify it. So I’m going to suggest that we roll back and try this again, keeping our working vocabulary as well-defined as we can.
As I understand EY’s account:
He endorses building an optimization process (that is, a process that acts to maximize the amount of some specified target) that uses as its target the set of human terminal values (that is, the things that we want for their own sake, rather than wanting because we believe they’ll get us something else).
He also endorses building this process in such a way that it will improve itself as required so as to be able to exert superhuman optimizing power towards its target. The term “Friendly AI” refers to processes of this sort—that is, self-improving superhuman optimization processes that use as their target the set of human terminal values.
He also endorses a particular process (building a seed AI that analyzes humans) as a way of identifying the set of human terminal values. The term “CEV” (or, sometimes, “CEV(humanity)”) refers to the output of such an analysis.
He endorses all of this not only as pragmatic for our purposes, but also as the morally right thing to do. Even if there’s an equally complex species out there whose terminal values differ from ours, on EY’s account the morally right thing to do is optimize the universe for our terminal values rather than for theirs or for some compromise between the two. Members of that species might believe that humans are wrong to do so, but if so they’ll be mistaken.
I understand that you believe I’m mistaken about some or all of the above. I’m really not clear at this point on what you think is mistaken, or what you think is true instead.
Can you edit the above to reflect where you think it’s mistaken?
The only part I disagree with strongly is the language of the last point. Referring to CEV as “THE morally right thing to do” makes it seem as if it were set in stone as the guaranteed best path to creating FAI, which it isn’t. EY argues that building Friendly AI instead of just letting the chips fall where they may is the morally right thing to do, and I’d agree with that, but not that CEV specifically is the right thing to do.
One general goal point for FAI is to target outcomes “at least as good” as those which would be caused by benevolent human mind upload(s). So, the kind of “moral development” that a community of uploads would undergo should be encapsulated within a FAI. In fact, any beneficial area of the moral state space that would be accessible starting from humans or any combination of humans and tools should be accessible by a good FAI design. CEV is one such proposal towards such a design.
As I understand it, yes, the thinking is to optimize for our terminal values instead of this hypothetical alien species or some compromise of the two. However, if values among different intelligent species converge given greater intelligence, knowledge, and self-reflection, then we would expect our FAI to have goals that converge with the alien FAI. If values do not converge, then we would suppose our FAI to have different values than alien FAIs.
A “terminal value” might include carefully thinking through philosophical questions such as this and designing the best goal content possible given these considerations. So, if there are hypothetical alien values that seem “correct” (or simply sufficiently desirable from the subjective perspective) to extrapolated humanity, these values would be integrated into the CEV-output.
I agree that EY does not assert that his proposed process for defining FAI’s optimization target (that is, seed AI calculating CEV) is necessarily the best path to FAI, nor that that proposed process is particularly right. Correction accepted.
And yes, I agree that on EY’s account, given an alien species whose values converge with ours, a system that optimizes for our terminal values also optimizes for theirs.
Isn’t comparing possible, “better” moralities to the current morality essential to the definition of “moral progress” and therefore indispensable to building a Friendly AI?
FAI’s goals should be fixed, unchanging (by initial design). I see three possible things related to a FAI that could be described as involving a “changing morality”. First, it’s possible that the definition of FAI’s unchanging goals could take the form where it makes sense to talk about some process of change in provisional goals, but this process of change would be a part of the definition of the unchanging result. For something like CEV, we might say that CEV is the first stage that takes care of collecting initial data from humans, tries to “extrapolate” goals from this data, decides on whether it can formulate FAI’s goals, and if successful runs a FAI with these (fixed) goals.
Second, the world managed by FAI might contain agents with changing morality, if the FAI decides that agents with changing morality are the right thing to create or maintain, according to FAI’s fixed morality.
And third, FAI itself might take significant time in understanding the logical implications of the fixed definition of its morality, either in general or as applied to particular (hypothetical) situations. Even mathematics with elementary axioms that human mathematicians do is quite complicated. Useful parts of the mathematics of human value might take billions of years to figure out.
Yeah, that’s an interesting question. I’ll offer a conjecture.
From my understanding, one of the fundamental assumptions of FAI is that there is somehow a stable moral attractor for every AI that is in the local neighborhood of its original goals, or perhaps only that this attractor is possible. No matter how intelligent the machine gets, no matter how many times it improves itself, it will consciously attempt to stay in the local neighborhood of this point (ala the Gandhi murder pill analogy).
If an AI is designed with a moral attractor that is essentially random, and thus probably totally antithetical to human values (such as paperclip manufacture), then it’s hard to be on the side of the machines. Giving control of the world over to machine super-intelligences sounds like an okay idea if you imagine them growing, doing science, populating the universe, etc., but if they just tear apart the world to make paperclips in an exceptionally clever manner, then perhaps it isn’t such a good idea. This is to say, if the machines use their intelligence to derive their morality, then siding with the machines is all well and good, but if their morality is programmed from the start, and the machines are merely exceptionally skilled morality executors, then there’s no good reason to be on the sides of the machines just because they execute their random morality much more effectively.
I am fairly hesitant to agree with the idea of the moral attractor, along with the goals of FAI in general. I understand the idea only through analogy, which is to say not at all, and I have little idea what would dictate the peaks and valleys of a moral landscape, or even the coordinates really. It also isn’t clear to me that a machine of such high intelligence would be incapable of forming new value systems, and perhaps discarding its preference for paper clips if there was no more paper to clip together.
While I’m exploring a very wide hypothesis space here about a person I know essentially nothing about, this sort of reasoning is at least consistent with what appears to be the thinking that undergirds work on FAI.
It also raises a very interesting question, which is perhaps more fundamental, and that is whether moral preferences are a function of intelligence or not. If so, the beings far more intelligent than us would presumably be more moral, and have a reasonable claim for our moral support. If not, then they’re simply more clever and more powerful, and neither is a particularly good reason to welcome our robot overlords.
An idea I just had, which I’m sure others have considered, but I will merely note here, is that a recursively self-modifying AI would be subject to Darwinian evolution, with lines of code analogous to individual genes, and indeed if there is a stable attractor for such an AI, it seems likely to be about as moral as evolution. which is not particularly encouraging.
I would also like to see this discussion. It isn’t terribly clear to me why the extinction of the human race and its replacement with some non-human AI is an inherently bad outcome. Why keep around and devote resources to human beings, who at best can be seen as sort of a prototype of true intelligence, since that’s not really what they’re designed for?
While imagining our extinction at the hands of our robot overlords seems unpleasant, if you imagine a gradual cyborg evolution to a post-human world, that seems scary, but not morally objectionable. Besides the Ship of Theseus, what’s the difference?
A long time ago, a different person who also happens to be named “Eliezer Yudkowsky” said that, in the event of a clash between human beings and superintelligent AIs, he would side with the latter. The Yudkowsky we all know rejects this position, though it is not clear to me why.
Not clear why? Because he likes people and doesn’t want everyone he knows (including himself), everyone he doesn’t know and any potential descendants of either to die? Doesn’t that sound like a default position? Most people don’t want themselves to go extinct.
“Superintelligent AIs” is not one thing, it’s a class of quadrillions of different possible things. The old Eliezer was probably thinking of one thing when he referred to superintelligences. When you realize that SAIs are a category of beings with more potential diversity than all species that have ever lived, it’s hard to side with them all as a group. You’d have to have poor aesthetics to value them all equally.
Thanks for the clarification. My understanding is that (the current) Eliezer doesn’t merely claim that we shouldn’t value all superintelligent AIs equally; he makes the much stronger claim that, in a conflict between humans and AIs, we should side with the former regardless of what kind of AI is actually involved in this conflict. This stronger claim seems much harder to defend precisely in light of the fact that the space of possible AIs is so vast. Surely there must be some AIs in this heterogenous group whose survival is preferable to that of creatures like us?
I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard. E.g. here:
That’s helpful. I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive. If this is so, I think it’s misleading to use the locution ‘friendly AI’ to designate such artificial agents, and am inclined to believe that many folks who are sympathetic to the goal of creating friendly AI wouldn’t be if they knew what was actually meant by that expression.
Not “that doesn’t sound quite right”, but “that’s completely wrong”. Friendly AI is defined as “human-benefiting, non-human harming”.
I would say that the defining characteristic of Friendly AI, as the term is used on LW, is that it optimizes for human values.
On this view, if it turns out that human values prefer that humans be harmed, then Friendly AI harms humans, and we ought to prefer that it do so.
That’s not the proper definition… Friendly AI, according to current guesses/theory, would be an extrapolation of human values. The extrapolation part is everything. I encourage you to check out that linked document, the system it defines (though just a rough sketch) is what is usually meant by “Friendly AI” around here. No one is arguing that “human values” = “what we absolutely must pursue”. I’m not sure that creating Friendly AI, a machine that helps us, should be considered as passing a moral judgment on mankind or the world. At least, it seems like a really informal way of looking at it, and probably unhelpful as it’s imbued with so much moral valence.
Let’s backtrack a bit.
I said:
Kaj replied:
I then said:
But now you reply:
It would clearly be wishful thinking to assume that the countless forms of AIs that “could be genuinely better than us in every regard” would all act in friendly ways towards humans, given that acting in other ways could potentially realize other goals that this superior beings might have.
That doesn’t sound quite right either, given Eliezer’s unusually strong anti-death preferences. (Nor do I think most other SI folks would endorse it; I wouldn’t.)
ETA: Friendly AI was also explicitly defined as “human-benefiting” in e.g. Creating Friendly AI:
Even though Eliezer has declared CFAI as outdated, I don’t think that particular bit is.
As I understand Eliezer’s current position, it is that the right thing to optimize the universe for is the set of things humans collectively value (aka “CEV(humanity)”).
On this account the space of all possible optimizing systems (aka “AIs” or “AGIs”) can be divided into two sets: those which optimize for CEV(humanity) (aka “Friendly AIs”), and those which optimize for something else (aka “Unfriendly AIs”).
And Friendly AIs are the right thing to “side with”, as you put it here, because CEV(humanity) is on this account the right thing to optimize for.
On this account, “why side with Friendly AI over Unfriendly?” is roughly equivalent to asking “why do the right thing?”
The survival of creatures like us is entirely beside the point. Maybe CEV(humanity) includes the survival of creatures like us and maybe it doesn’t.
Now, you might ask, why is CEV(humanity) the right thing to optimize the universe for, as opposed to something else? To which I think Eliezer’s reply is that this is simply what it means to be right; things are right insofar as they correspond to what humans collectively value.
Some people (myself among them) find this an unconvincing argument. That said, I don’t think anyone has made a convincing argument that some specific other thing is better to optimize for, either.
No. The argument is more like that there’s no source of complex value in the world besides humans, and writing complex values line by line would take thousands of years, so we are forced to use some combination and/or extrapolation of human values, whether we want to or not.
Hm.
If you have citations for EY articulating the idea that writing superior nonhuman values would take too long to do, rather than that it’s fundamentally incoherent, I’d be interested. This would completely change my understanding of the whole Metaethics Sequence.
Whole brain emulation would basically be “copying” human values in a machine, and would demonstrate that “writing” human values is possible. You could then edit a couple morally relevant bits, and you’d be demonstrating that you could “create” a human-like but slightly edited morality. Evaluating whether it is “superior” by some metric would be a whole additional exercise, though.
I don’t think the metaethics sequence implies that writing down values is impossible, just that human values are very complex and messy.
Sure, if we drop the idea of “superior,” I agree completely that it’s possible (in principle) to write a set of values, and that the metaethics sequence does not imply otherwise.
And, also, it implies—well, it asserts—that human values are very complex and messy, as you say.
IIRC, it also asserts that human values are right. Which is why I think that on EY’s view, evaluating whether the “edited morality” you describe here is superior to human values is not just an additional exercise, but an unnecessary (and perhaps incoherent) one. On his view, I think we can know a priori that it isn’t.
Actually, now that I think about it more… when you say “there’s no source of complex value in the world besides humans”, do you mean to suggest that aliens with equally complex incompatible values simply can’t exist, or that if they did exist EY’s conclusions would change in some way to account for them?
I believe that EY definitively rejected the idea of there being an objective morality back in 2003 or thereabouts. Unless I am forgetting something from the metaethics sequence.
The whole point of CEV is to create a “superior” morality, though I think that too value-loaded of a word to use; the better word is “extrapolated”. The whole idea of Friendly AI is to create a moral agent that continues to progress. So I’m not sure why you’re claiming that EY is claiming that the notion of moral self-evaluation in AI is unnecessary. Isn’t comparing possible, “better” moralities to the current morality essential to the definition of “moral progress” and therefore indispensable to building a Friendly AI?
To respond to your last statement, no to both. Of course aliens with equally complex incompatible values can exist, and I’m sure they do in some faraway place. Those aliens don’t live here, though, so I’m not sure why we’d want to build a Friendly AI for their values rather than our own. The idea of building a Friendly AI is to ensure some kind of “metamoral continuity” through the intelligence explosion.
To some extent, I think we may be talking past each other when I talk about values and you reply about moralities.
To clarify: would you say that this process you refer to of creating a different “morality” (whether it’s different by virtue of being superior or extrapolated or something else is beside my point right now) keeps values fixed, or not?
I think it depends on what is meant by “values”. I would say that the values change while the fundamental motivations are fixed, though Vladimir’s response makes me unsure about this. Another way of saying it is that supergoals are fixed but the “Friendliness content” changes. (Though I haven’t seen the phrase “Friendliness content” around much lately, perhaps it’s being discarded in favor of more formal terms.)
Maybe another useful distinction would be between Friendliness structure and content (see the CFAI entry on the wiki).
I have to admit, the proliferation of terms in this discussion is making me less and less clear that I understand what was being said when you corrected me initially, despite several attempts to clarify it. So I’m going to suggest that we roll back and try this again, keeping our working vocabulary as well-defined as we can.
As I understand EY’s account:
He endorses building an optimization process (that is, a process that acts to maximize the amount of some specified target) that uses as its target the set of human terminal values (that is, the things that we want for their own sake, rather than wanting because we believe they’ll get us something else).
He also endorses building this process in such a way that it will improve itself as required so as to be able to exert superhuman optimizing power towards its target. The term “Friendly AI” refers to processes of this sort—that is, self-improving superhuman optimization processes that use as their target the set of human terminal values.
He also endorses a particular process (building a seed AI that analyzes humans) as a way of identifying the set of human terminal values. The term “CEV” (or, sometimes, “CEV(humanity)”) refers to the output of such an analysis.
He endorses all of this not only as pragmatic for our purposes, but also as the morally right thing to do. Even if there’s an equally complex species out there whose terminal values differ from ours, on EY’s account the morally right thing to do is optimize the universe for our terminal values rather than for theirs or for some compromise between the two. Members of that species might believe that humans are wrong to do so, but if so they’ll be mistaken.
I understand that you believe I’m mistaken about some or all of the above.
I’m really not clear at this point on what you think is mistaken, or what you think is true instead.
Can you edit the above to reflect where you think it’s mistaken?
The only part I disagree with strongly is the language of the last point. Referring to CEV as “THE morally right thing to do” makes it seem as if it were set in stone as the guaranteed best path to creating FAI, which it isn’t. EY argues that building Friendly AI instead of just letting the chips fall where they may is the morally right thing to do, and I’d agree with that, but not that CEV specifically is the right thing to do.
One general goal point for FAI is to target outcomes “at least as good” as those which would be caused by benevolent human mind upload(s). So, the kind of “moral development” that a community of uploads would undergo should be encapsulated within a FAI. In fact, any beneficial area of the moral state space that would be accessible starting from humans or any combination of humans and tools should be accessible by a good FAI design. CEV is one such proposal towards such a design.
As I understand it, yes, the thinking is to optimize for our terminal values instead of this hypothetical alien species or some compromise of the two. However, if values among different intelligent species converge given greater intelligence, knowledge, and self-reflection, then we would expect our FAI to have goals that converge with the alien FAI. If values do not converge, then we would suppose our FAI to have different values than alien FAIs.
A “terminal value” might include carefully thinking through philosophical questions such as this and designing the best goal content possible given these considerations. So, if there are hypothetical alien values that seem “correct” (or simply sufficiently desirable from the subjective perspective) to extrapolated humanity, these values would be integrated into the CEV-output.
I agree that EY does not assert that his proposed process for defining FAI’s optimization target (that is, seed AI calculating CEV) is necessarily the best path to FAI, nor that that proposed process is particularly right. Correction accepted.
And yes, I agree that on EY’s account, given an alien species whose values converge with ours, a system that optimizes for our terminal values also optimizes for theirs.
Thanks.
FAI’s goals should be fixed, unchanging (by initial design). I see three possible things related to a FAI that could be described as involving a “changing morality”. First, it’s possible that the definition of FAI’s unchanging goals could take the form where it makes sense to talk about some process of change in provisional goals, but this process of change would be a part of the definition of the unchanging result. For something like CEV, we might say that CEV is the first stage that takes care of collecting initial data from humans, tries to “extrapolate” goals from this data, decides on whether it can formulate FAI’s goals, and if successful runs a FAI with these (fixed) goals.
Second, the world managed by FAI might contain agents with changing morality, if the FAI decides that agents with changing morality are the right thing to create or maintain, according to FAI’s fixed morality.
And third, FAI itself might take significant time in understanding the logical implications of the fixed definition of its morality, either in general or as applied to particular (hypothetical) situations. Even mathematics with elementary axioms that human mathematicians do is quite complicated. Useful parts of the mathematics of human value might take billions of years to figure out.
Yeah, that’s an interesting question. I’ll offer a conjecture.
From my understanding, one of the fundamental assumptions of FAI is that there is somehow a stable moral attractor for every AI that is in the local neighborhood of its original goals, or perhaps only that this attractor is possible. No matter how intelligent the machine gets, no matter how many times it improves itself, it will consciously attempt to stay in the local neighborhood of this point (ala the Gandhi murder pill analogy).
If an AI is designed with a moral attractor that is essentially random, and thus probably totally antithetical to human values (such as paperclip manufacture), then it’s hard to be on the side of the machines. Giving control of the world over to machine super-intelligences sounds like an okay idea if you imagine them growing, doing science, populating the universe, etc., but if they just tear apart the world to make paperclips in an exceptionally clever manner, then perhaps it isn’t such a good idea. This is to say, if the machines use their intelligence to derive their morality, then siding with the machines is all well and good, but if their morality is programmed from the start, and the machines are merely exceptionally skilled morality executors, then there’s no good reason to be on the sides of the machines just because they execute their random morality much more effectively.
I am fairly hesitant to agree with the idea of the moral attractor, along with the goals of FAI in general. I understand the idea only through analogy, which is to say not at all, and I have little idea what would dictate the peaks and valleys of a moral landscape, or even the coordinates really. It also isn’t clear to me that a machine of such high intelligence would be incapable of forming new value systems, and perhaps discarding its preference for paper clips if there was no more paper to clip together.
While I’m exploring a very wide hypothesis space here about a person I know essentially nothing about, this sort of reasoning is at least consistent with what appears to be the thinking that undergirds work on FAI.
It also raises a very interesting question, which is perhaps more fundamental, and that is whether moral preferences are a function of intelligence or not. If so, the beings far more intelligent than us would presumably be more moral, and have a reasonable claim for our moral support. If not, then they’re simply more clever and more powerful, and neither is a particularly good reason to welcome our robot overlords.
An idea I just had, which I’m sure others have considered, but I will merely note here, is that a recursively self-modifying AI would be subject to Darwinian evolution, with lines of code analogous to individual genes, and indeed if there is a stable attractor for such an AI, it seems likely to be about as moral as evolution. which is not particularly encouraging.