Knowing (some of) what you want an algorithm to do is not the same as knowing what the algorithm is. It seems likely to me that neither Eliezer nor anyone else knows what a Friendly AI algorithm is. They have a partial understanding of it and would like to have more of one.
It is conceivable that some future research might (for instance) prove that Friendly AI is impossible, in the same regard that a general solution to the halting problem is impossible. In such a case, a person who subsequently said “I want to figure out how to build Friendly AI” would be engaging in wishful thinking.
Once upon a time, someone could have said, “I want to figure out how to build a general solution to the halting problem.” That person might have known what the inputs and outputs of a general solution to the halting problem would look like, but they could not have known what a general solution to it was, since there’s no such thing.
Knowing (some of) what you want an algorithm to do is not the same as knowing what the algorithm is. It seems likely to me that neither Eliezer nor anyone else knows what a Friendly AI algorithm is. They have a partial understanding of it and would like to have more of one.
It is conceivable that some future research might (for instance) prove that Friendly AI is impossible, in the same regard that a general solution to the halting problem is impossible. In such a case, a person who subsequently said “I want to figure out how to build Friendly AI” would be engaging in wishful thinking.
Once upon a time, someone could have said, “I want to figure out how to build a general solution to the halting problem.” That person might have known what the inputs and outputs of a general solution to the halting problem would look like, but they could not have known what a general solution to it was, since there’s no such thing.