Of course at this point, the terminology “Friendly” becomes misleading, and we should talk about a Goal-X-controlled-AGI, where Goal X is a variable for the goal that that AGI would optimize for.
There is no unique value for X. Some have suggested the output of CEV as the goal system, but if you look at CEV in detail, you see that it is jam-packed with parameters, all of which make a difference to the actual output.
I would personally lobby against the idea of an AGI that did crazy shit like killing existing people to save a few nanoseconds.
Hm, I’ve noticed before that the term ‘Friendly’ is sort of vague. What would I call an AI that optimizes strictly for my goals (and if I care about others’ goals, so be it)? A Will-AI? I’ve said a few times ‘your Friendly is not my Friendly’ but I think I was just redefining Friendliness in an incorrect way that Eliezer wouldn’t endorse.
What would I call an AI that optimizes strictly for my goals...A Will-AI?
One could say “Friendly towards Will.”
But the problem of nailing down your goals seems to me much harder than the problem of negotiating goals between different people. Thus I don’t see a problem of being vague about the target of Friendliness.
But the problem of nailing down your goals seems to me much harder than the problem of negotiating goals between different people. Thus I don’t see a problem of being vague about the target of Friendliness.
Agreed. And asking the question of what is preference of a specific person, represented in some formal language, seems to be a natural simplification of the problem statement, something that needs to be understood before the problem of preference aggregation can be approached.
but I think I was just redefining Friendliness in an incorrect way that Eliezer wouldn’t endorse.
Beware of the urge to censor thoughts that disagree with authority. I personally agree that there is a serious issue here—the issue of moral antirealism, which implies that there is no “canonical human notion of goodness”, so the terminology “Friendly AI” is actually somewhat misleading, and it might be better to say “average human extrapolated morality AGI” when that’s what we want to talk about, e.g.
“an average human extrapolated morality AGI would oppose a paperclip maximizer”.
Then it sounds less onerous to say that you disagree with what an average human extrapolated morality AGI would do than that you disagree with what a “Friendly AI” would do, because most people on this forum disagree with averaged-out human morality (for example, the average human is a theist). Contrast:
“What, you disagree with the FAI? Are you a bad guy then?”
“Friendly AI” is about as specific/ambiguous as “morality”—something humans mostly have in common, allowing for normal variation, not referring to details about specific people. As with preference (morality) of specific people, we can speak of FAI optimizing the world to preference of specific people. Naturally, for each given person it’s preferable to launch a personal-FAI to a consensus-FAI.
Of course at this point, the terminology “Friendly” becomes misleading, and we should talk about a Goal-X-controlled-AGI, where Goal X is a variable for the goal that that AGI would optimize for.
There is no unique value for X. Some have suggested the output of CEV as the goal system, but if you look at CEV in detail, you see that it is jam-packed with parameters, all of which make a difference to the actual output.
I would personally lobby against the idea of an AGI that did crazy shit like killing existing people to save a few nanoseconds.
Hm, I’ve noticed before that the term ‘Friendly’ is sort of vague. What would I call an AI that optimizes strictly for my goals (and if I care about others’ goals, so be it)? A Will-AI? I’ve said a few times ‘your Friendly is not my Friendly’ but I think I was just redefining Friendliness in an incorrect way that Eliezer wouldn’t endorse.
One could say “Friendly towards Will.”
But the problem of nailing down your goals seems to me much harder than the problem of negotiating goals between different people. Thus I don’t see a problem of being vague about the target of Friendliness.
Agreed. And asking the question of what is preference of a specific person, represented in some formal language, seems to be a natural simplification of the problem statement, something that needs to be understood before the problem of preference aggregation can be approached.
Beware of the urge to censor thoughts that disagree with authority. I personally agree that there is a serious issue here—the issue of moral antirealism, which implies that there is no “canonical human notion of goodness”, so the terminology “Friendly AI” is actually somewhat misleading, and it might be better to say “average human extrapolated morality AGI” when that’s what we want to talk about, e.g.
Then it sounds less onerous to say that you disagree with what an average human extrapolated morality AGI would do than that you disagree with what a “Friendly AI” would do, because most people on this forum disagree with averaged-out human morality (for example, the average human is a theist). Contrast:
“Friendly AI” is about as specific/ambiguous as “morality”—something humans mostly have in common, allowing for normal variation, not referring to details about specific people. As with preference (morality) of specific people, we can speak of FAI optimizing the world to preference of specific people. Naturally, for each given person it’s preferable to launch a personal-FAI to a consensus-FAI.