Okay, to simplify, suppose the AI has a function …
Boolean humankind_approves(Outcome o)
… that returns 1 when humankind would approve of a particular outcome o, and zero otherwise.
At any given point, the AI needs to have a well specified utility function.
Okay, to simplify, suppose the AI has a function …
Outcome U(Input i)
… which returns the outcome(s) (e.g., answer, plan) that optimizes expected utility given the input i.
But it doesn’t have any reason to care.
Assuming the AI is corrigible (I think we all agree that if the AI is not corrigible, it shouldn’t be turned on), we modify its utility function to U’ where
U’(i) = U(i) when humankind_approves(U(i)) or null if there does not exist a U(i) such that humankind_approves(U(i))
I suggest that an AI with utility function U’ is a friendly AI.
It could look at the existing research
I think extrapolation from existing research is an interesting area of study, but I was attempting to evoke the surprise of a breakthrough invention. To me, the most interesting inventions are exactly those inventions that are not mundane extrapolations of existing techniques.
Okay, to simplify, suppose the AI has a function …
Boolean humankind_approves(Outcome o)
… that returns 1 when humankind would approve of a particular outcome o, and zero otherwise.
Okay, to simplify, suppose the AI has a function …
Outcome U(Input i)
… which returns the outcome(s) (e.g., answer, plan) that optimizes expected utility given the input i.
Assuming the AI is corrigible (I think we all agree that if the AI is not corrigible, it shouldn’t be turned on), we modify its utility function to U’ where
U’(i) = U(i) when humankind_approves(U(i)) or null if there does not exist a U(i) such that humankind_approves(U(i))
I suggest that an AI with utility function U’ is a friendly AI.
I think extrapolation from existing research is an interesting area of study, but I was attempting to evoke the surprise of a breakthrough invention. To me, the most interesting inventions are exactly those inventions that are not mundane extrapolations of existing techniques.