That’s not the proper definition… Friendly AI, according to current guesses/theory, would be an extrapolation of human values. The extrapolation part is everything. I encourage you to check out that linked document, the system it defines (though just a rough sketch) is what is usually meant by “Friendly AI” around here. No one is arguing that “human values” = “what we absolutely must pursue”. I’m not sure that creating Friendly AI, a machine that helps us, should be considered as passing a moral judgment on mankind or the world. At least, it seems like a really informal way of looking at it, and probably unhelpful as it’s imbued with so much moral valence.
[Eliezer] makes the much stronger claim that, in a conflict between humans and AIs, we should side with the former regardless of what kind of AI is actually involved in this conflict.
Kaj replied:
I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard.
I then said:
I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive.
But now you reply:
Friendly AI is defined as “human-benefiting, non-human harming”.
It would clearly be wishful thinking to assume that the countless forms of AIs that “could be genuinely better than us in every regard” would all act in friendly ways towards humans, given that acting in other ways could potentially realize other goals that this superior beings might have.
Not “that doesn’t sound quite right”, but “that’s completely wrong”. Friendly AI is defined as “human-benefiting, non-human harming”.
I would say that the defining characteristic of Friendly AI, as the term is used on LW, is that it optimizes for human values.
On this view, if it turns out that human values prefer that humans be harmed, then Friendly AI harms humans, and we ought to prefer that it do so.
That’s not the proper definition… Friendly AI, according to current guesses/theory, would be an extrapolation of human values. The extrapolation part is everything. I encourage you to check out that linked document, the system it defines (though just a rough sketch) is what is usually meant by “Friendly AI” around here. No one is arguing that “human values” = “what we absolutely must pursue”. I’m not sure that creating Friendly AI, a machine that helps us, should be considered as passing a moral judgment on mankind or the world. At least, it seems like a really informal way of looking at it, and probably unhelpful as it’s imbued with so much moral valence.
Let’s backtrack a bit.
I said:
Kaj replied:
I then said:
But now you reply:
It would clearly be wishful thinking to assume that the countless forms of AIs that “could be genuinely better than us in every regard” would all act in friendly ways towards humans, given that acting in other ways could potentially realize other goals that this superior beings might have.