I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard. E.g. here:
Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.
“Well,” says the one, “maybe according to your provincial human values, you wouldn’t like it. But I can easily imagine a galactic civilization full of agents who are nothing like you, yet find great value and interest in their own goals. And that’s fine by me. I’m not so bigoted as you are. Let the Future go its own way, without trying to bind it forever to the laughably primitive prejudices of a pack of four-limbed Squishy Things—”
My friend, I have no problem with the thought of a galactic civilization vastly unlike our own… full of strange beings who look nothing like me even in their own imaginations… pursuing pleasures and experiences I can’t begin to empathize with… trading in a marketplace of unimaginable goods… allying to pursue incomprehensible objectives… people whose life-stories I could never understand.
That’s what the Future looks like if things go right.
If the chain of inheritance from human (meta)morals is broken, the Future does not look like this. It does not end up magically, delightfully incomprehensible.
With very high probability, it ends up looking dull. Pointless. Something whose loss you wouldn’t mourn.
That’s helpful. I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive. If this is so, I think it’s misleading to use the locution ‘friendly AI’ to designate such artificial agents, and am inclined to believe that many folks who are sympathetic to the goal of creating friendly AI wouldn’t be if they knew what was actually meant by that expression.
That’s not the proper definition… Friendly AI, according to current guesses/theory, would be an extrapolation of human values. The extrapolation part is everything. I encourage you to check out that linked document, the system it defines (though just a rough sketch) is what is usually meant by “Friendly AI” around here. No one is arguing that “human values” = “what we absolutely must pursue”. I’m not sure that creating Friendly AI, a machine that helps us, should be considered as passing a moral judgment on mankind or the world. At least, it seems like a really informal way of looking at it, and probably unhelpful as it’s imbued with so much moral valence.
[Eliezer] makes the much stronger claim that, in a conflict between humans and AIs, we should side with the former regardless of what kind of AI is actually involved in this conflict.
Kaj replied:
I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard.
I then said:
I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive.
But now you reply:
Friendly AI is defined as “human-benefiting, non-human harming”.
It would clearly be wishful thinking to assume that the countless forms of AIs that “could be genuinely better than us in every regard” would all act in friendly ways towards humans, given that acting in other ways could potentially realize other goals that this superior beings might have.
That doesn’t sound quite right either, given Eliezer’s unusually strong anti-death preferences. (Nor do I think most other SI folks would endorse it; I wouldn’t.)
ETA: Friendly AI was also explicitly defined as “human-benefiting” in e.g. Creating Friendly AI:
The term “Friendly AI” refers to the production of human-benefiting, non-humanharming
actions in Artificial Intelligence systems that have advanced to the point of
making real-world plans in pursuit of goals.
Even though Eliezer has declared CFAI as outdated, I don’t think that particular bit is.
I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard. E.g. here:
That’s helpful. I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive. If this is so, I think it’s misleading to use the locution ‘friendly AI’ to designate such artificial agents, and am inclined to believe that many folks who are sympathetic to the goal of creating friendly AI wouldn’t be if they knew what was actually meant by that expression.
Not “that doesn’t sound quite right”, but “that’s completely wrong”. Friendly AI is defined as “human-benefiting, non-human harming”.
I would say that the defining characteristic of Friendly AI, as the term is used on LW, is that it optimizes for human values.
On this view, if it turns out that human values prefer that humans be harmed, then Friendly AI harms humans, and we ought to prefer that it do so.
That’s not the proper definition… Friendly AI, according to current guesses/theory, would be an extrapolation of human values. The extrapolation part is everything. I encourage you to check out that linked document, the system it defines (though just a rough sketch) is what is usually meant by “Friendly AI” around here. No one is arguing that “human values” = “what we absolutely must pursue”. I’m not sure that creating Friendly AI, a machine that helps us, should be considered as passing a moral judgment on mankind or the world. At least, it seems like a really informal way of looking at it, and probably unhelpful as it’s imbued with so much moral valence.
Let’s backtrack a bit.
I said:
Kaj replied:
I then said:
But now you reply:
It would clearly be wishful thinking to assume that the countless forms of AIs that “could be genuinely better than us in every regard” would all act in friendly ways towards humans, given that acting in other ways could potentially realize other goals that this superior beings might have.
That doesn’t sound quite right either, given Eliezer’s unusually strong anti-death preferences. (Nor do I think most other SI folks would endorse it; I wouldn’t.)
ETA: Friendly AI was also explicitly defined as “human-benefiting” in e.g. Creating Friendly AI:
Even though Eliezer has declared CFAI as outdated, I don’t think that particular bit is.