I wonder if you could comment on the desirability, from the perspective of Friendly AI, of trying to figure out the “human utility function” directly.
As I understand it, the current default strategy for Friendliness is to point the seed AI at an instance of humanity and say, turn yourself into an ideal moral agent, as defined by the implicit morality of that entity over there. This requires a method of determining the decision procedure of the entity in question (including some way of neglecting inessentials), extrapolating a species-relative ideal morality from the utility-function-analogue thereby deduced (this is to be the role of “reflective decision theory”), and also some notion of interim Friendliness which ensures (e.g.) that bad things aren’t done to the human species by the developing AI while it’s still figuring out true Friendliness. If all this can be coupled to a working process of raw cognitive self-enhancement in the developing AI, then success, the first superhuman intelligence will be a Friendly one.
This is a logical strategy and it really does define a research program that might issue in success. However, computational and cognitive neuroscientists are already working to characterize the human brain as a decision system. In an alternative scenario, it would not be necessary to delegate the process of analyzing human decision procedures to the seed AI, because it would already have been worked out by humans. It could serve as an epistemic check for the AI—give it some of the data and see if it reaches the same conclusions—just as basic physics offers a way to test an AI’s ability to construct theories. But we don’t need an AI to figure out, say, QED for us—we already did that ourselves. The same thing may happen with respect to the study of human morality.
I wonder if you could comment on the desirability, from the perspective of Friendly AI, of trying to figure out the “human utility function” directly.
As I understand it, the current default strategy for Friendliness is to point the seed AI at an instance of humanity and say, turn yourself into an ideal moral agent, as defined by the implicit morality of that entity over there. This requires a method of determining the decision procedure of the entity in question (including some way of neglecting inessentials), extrapolating a species-relative ideal morality from the utility-function-analogue thereby deduced (this is to be the role of “reflective decision theory”), and also some notion of interim Friendliness which ensures (e.g.) that bad things aren’t done to the human species by the developing AI while it’s still figuring out true Friendliness. If all this can be coupled to a working process of raw cognitive self-enhancement in the developing AI, then success, the first superhuman intelligence will be a Friendly one.
This is a logical strategy and it really does define a research program that might issue in success. However, computational and cognitive neuroscientists are already working to characterize the human brain as a decision system. In an alternative scenario, it would not be necessary to delegate the process of analyzing human decision procedures to the seed AI, because it would already have been worked out by humans. It could serve as an epistemic check for the AI—give it some of the data and see if it reaches the same conclusions—just as basic physics offers a way to test an AI’s ability to construct theories. But we don’t need an AI to figure out, say, QED for us—we already did that ourselves. The same thing may happen with respect to the study of human morality.