I’ve been interpreting “Friendly AI” to mean something like:
A system that acts as a really powerful expected utility maximizer
whose utility function is specified
and whose utility function is desirable.
I intend this to be consistent with Eliezer’s definition but I can’t be certain.
As the OP notes, this is a strict subset of “AI systems designed to be of low risk.” and Armstrong, Sandberg & Bostrom appear confused here. They’re citing an old Yudkowsky paper from 2001 (which I understand is no longer canonical?) so I’m hoping this is a simple slip rather than a confusion of ideas.
As the OP also notes, there’s also some potential confusion about the meaning of a “desirable” utility function here. Does it have to be an “ideal extrapolation of our values” (this may be one of the concepts Luke worries is “incoherent”). Or does it just have to be “good enough”?
Is “good enough” only good enough if it allows itself to be upgraded, like a Nanny AI? (Most of the time we expect utility maximizers to squelch competitors with a different utility function, so this provision would need to be encoded explicitly).
While I ordinarily try to stay out of the exegesis business, I will observe that EY’s failed utopia story seems to suggest that “good enough” is not compatible with his goal.
IIRC EY even agrees in the comments that the state-change implemented by that optimizer is a net improvement, but nevertheless implies that leaving a value out of the list of things to preserve/maximize (whether that value is the thing that scores existing relationships higher than new ones that are otherwise better, or that scores relationships to other humans higher than relationships to nonhuman entities created for the purpose of having such a relationship, or something else, is importantly left unclear, but the story definitely suggests that there’s some value that was left out of the mix) means we ought to prefer that such an optimizer not be run.
EDIT: Someone actually bothered to do the research below, and it seems IDRC. It’s not that we ought to prefer that such an optimizer not be run, it’s that we ought to prefer that the (fictional) process that led to that optimizer not be implemented, since in most worlds where it is the result, unlike in the world depicted, is worse than the status quo. (This is why I try to stay out of exegesis.)
This comment seems to imply that EY would prefer that such an optimizer be run, if the only other option was business-as-usual.
Okay, just to disclaim this clearly, I probably would press the button that instantly swaps us to this world—but that’s because right now people are dying, and this world implies a longer time to work on FAI 2.0.
But the Wrinkled Genie scenario is not supposed to be probable or attainable—most programmers this stupid just kill you, I think.
EDIT: that doesn’t imply it earns the label “Friendly” though.
I’ve been interpreting “Friendly AI” to mean something like:
A system that acts as a really powerful expected utility maximizer
whose utility function is specified
and whose utility function is desirable.
I intend this to be consistent with Eliezer’s definition but I can’t be certain.
As the OP notes, this is a strict subset of “AI systems designed to be of low risk.” and Armstrong, Sandberg & Bostrom appear confused here. They’re citing an old Yudkowsky paper from 2001 (which I understand is no longer canonical?) so I’m hoping this is a simple slip rather than a confusion of ideas.
As the OP also notes, there’s also some potential confusion about the meaning of a “desirable” utility function here. Does it have to be an “ideal extrapolation of our values” (this may be one of the concepts Luke worries is “incoherent”). Or does it just have to be “good enough”?
Is “good enough” only good enough if it allows itself to be upgraded, like a Nanny AI? (Most of the time we expect utility maximizers to squelch competitors with a different utility function, so this provision would need to be encoded explicitly).
While I ordinarily try to stay out of the exegesis business, I will observe that EY’s failed utopia story seems to suggest that “good enough” is not compatible with his goal.
IIRC EY even agrees in the comments that the state-change implemented by that optimizer is a net improvement, but nevertheless implies that leaving a value out of the list of things to preserve/maximize (whether that value is the thing that scores existing relationships higher than new ones that are otherwise better, or that scores relationships to other humans higher than relationships to nonhuman entities created for the purpose of having such a relationship, or something else, is importantly left unclear, but the story definitely suggests that there’s some value that was left out of the mix) means we ought to prefer that such an optimizer not be run.
EDIT: Someone actually bothered to do the research below, and it seems IDRC. It’s not that we ought to prefer that such an optimizer not be run, it’s that we ought to prefer that the (fictional) process that led to that optimizer not be implemented, since in most worlds where it is the result, unlike in the world depicted, is worse than the status quo. (This is why I try to stay out of exegesis.)
This comment seems to imply that EY would prefer that such an optimizer be run, if the only other option was business-as-usual.
EDIT: that doesn’t imply it earns the label “Friendly” though.