How can we have Friendly AI even if we humans cannot agree about our ethical values? This is an SQ because probably this was the first problem solved -it just so obivious—yet I cannot find it.
I have not finished the sequences yet, but they sound a bit optimistic to me—as if basically everybody is a modern utilitarian and the rest of the people just don’t count. To give you the really dumbest question: what about religious folks? Is it just supposed to be a secular-values AI and they can go pound sand, or some sort of an agreement, compromise drawn with them and then that implemented? Is some sort of a generally agreed Human Values system a prerequisite?
My issue here is that if we want to listen to everybody, then this will be a never-ending debate. If you draw the line and e.g. include on people with reasonably utilitarian value systems, where do you draw the line etc.
As I told someone else, this pdf has preliminary discussion about how to resolve differences that persist under extrapolation.
The specific example of religious disagreements seems like a trivial problem to anyone who gets far enough to consider the question. Since there aren’t any gods, the AI can ask what religious people would want if they accepted this fact. (This is roughly why I would oppose extrapolating only LW-ers rather than humanity as a whole.) But hey, maybe the question is more difficult than I think—we wouldn’t specifically tell the AI to be an atheist if general rules of thinking did not suffice—or maybe this focus on surface claims hides some deeper disagreement that can’t be so easily settled by probability.
The “if we knew more, thought faster, were more the people we wished we were, had grown up farther together” CEV idea hopes that the disagreements are really just misunderstandings and mistakes in some sense. Otherwise, take some form of average or median, I guess?
This is an SQ because probably this was the first problem solved -it just so obivious—yet I cannot find it.
AFAIK it has not been solved and if it has, I would love to hear about it too. I also believe that, like you said, while it’s possible for humans to negotiate and agree, any agreement would clearly be a compromise, and not the simultaneous fulfillment of everyone’s different values.
CEV has, IIRC, a lot of handwaving and unfounded assumptions about the existence and qualities of the One True Utility Function it’s trying to build. Is there something better?
How can we have Friendly AI even if we humans cannot agree about our ethical values? This is an SQ because probably this was the first problem solved -it just so obivious—yet I cannot find it.
I have not finished the sequences yet, but they sound a bit optimistic to me—as if basically everybody is a modern utilitarian and the rest of the people just don’t count. To give you the really dumbest question: what about religious folks? Is it just supposed to be a secular-values AI and they can go pound sand, or some sort of an agreement, compromise drawn with them and then that implemented? Is some sort of a generally agreed Human Values system a prerequisite?
My issue here is that if we want to listen to everybody, then this will be a never-ending debate. If you draw the line and e.g. include on people with reasonably utilitarian value systems, where do you draw the line etc.
As I told someone else, this pdf has preliminary discussion about how to resolve differences that persist under extrapolation.
The specific example of religious disagreements seems like a trivial problem to anyone who gets far enough to consider the question. Since there aren’t any gods, the AI can ask what religious people would want if they accepted this fact. (This is roughly why I would oppose extrapolating only LW-ers rather than humanity as a whole.) But hey, maybe the question is more difficult than I think—we wouldn’t specifically tell the AI to be an atheist if general rules of thinking did not suffice—or maybe this focus on surface claims hides some deeper disagreement that can’t be so easily settled by probability.
The “if we knew more, thought faster, were more the people we wished we were, had grown up farther together” CEV idea hopes that the disagreements are really just misunderstandings and mistakes in some sense. Otherwise, take some form of average or median, I guess?
Sorry, what is CEV?
Coherent Extrapolated Volition.
Or was that a snarky question referring to the fact that CEV is underspecified and may not exist?
No, real, I did not get that far in the book yet. Thanks.
AFAIK it has not been solved and if it has, I would love to hear about it too. I also believe that, like you said, while it’s possible for humans to negotiate and agree, any agreement would clearly be a compromise, and not the simultaneous fulfillment of everyone’s different values.
CEV has, IIRC, a lot of handwaving and unfounded assumptions about the existence and qualities of the One True Utility Function it’s trying to build. Is there something better?