I suspect having a good estimation of the “human utility function” (even stripped of biases etc.) is not the hardest part of the problem. A “perfect” human, given great power and ability to self-modify, may still result in a disaster. Human morality is mostly calibrating for dealing with others of around the same power.
Well, human values are probably variant to some degree between humans, so a Friendly AI wouldn’t so much be ‘maximize generic human utility function’ as ‘take all the human utility functions you can find as of now, find those portions which are reflexively consistent, weight them by frequency, and take those actions that are best supported by the convergent portions of those utility functions.’ At least, that was the gist of CEV circa 2004. Not sure what Eliezer and co are working on these days, but that sounds like a reasonable way to build a nice future to me. A fair one, at least.
I suspect having a good estimation of the “human utility function” (even stripped of biases etc.) is not the hardest part of the problem. A “perfect” human, given great power and ability to self-modify, may still result in a disaster. Human morality is mostly calibrating for dealing with others of around the same power.
Well, human values are probably variant to some degree between humans, so a Friendly AI wouldn’t so much be ‘maximize generic human utility function’ as ‘take all the human utility functions you can find as of now, find those portions which are reflexively consistent, weight them by frequency, and take those actions that are best supported by the convergent portions of those utility functions.’ At least, that was the gist of CEV circa 2004. Not sure what Eliezer and co are working on these days, but that sounds like a reasonable way to build a nice future to me. A fair one, at least.