If I wanted that, I could just set it up to follow orders to the best of its understanding, and then order it around. The whole point is to make use of the fact that it’s smarter than I am and can achieve outcomes I can’t foresee in ways I can’t think up.
The AI here can do things which you wouldn’t think up.
For example, it could have more computational power than you to search for plans which maximize expected utility according to your probability and utility functions. Then, it could tell you the answer, if you’re the kind of person who likes to be told those kinds of answers (IE, if this doesn’t violate your sense of autonomy/self-determination).
Or, if there is any algorithm P′ whose beliefs you trust more than your own, or would trust more than your own if some conditions held (which the AI can itself check), then the AI can optimize your utility function under expected value under P′ rather than under your own beliefs, since you would prefer that.
For example, it could have more computational power than you to search for plans which maximize expected utility according to your probability and utility functions. Then, it could tell you the answer
Would it, though? It’s not evaluating actions on my future probutility, otherwise it would wirehead me. It’s evaluating actions on my present probutility. So now the answer seems to depend on whether we allow “tell me the right answer” as a primitive action, or if it is evaluated as “tell me [String],” which has low probutility.
But of course, if tell me the right answer is primitive, how do we stop “do the right thing” from being primitive, which lands us right back in the hot water of strong optimization of ‘utility’ this proposal was supposed to prevent? So I think it should evaluate the specific output, which has low probability(human), and therefore not tell you.
The AI here can do things which you wouldn’t think up.
For example, it could have more computational power than you to search for plans which maximize expected utility according to your probability and utility functions. Then, it could tell you the answer, if you’re the kind of person who likes to be told those kinds of answers (IE, if this doesn’t violate your sense of autonomy/self-determination).
Or, if there is any algorithm P′ whose beliefs you trust more than your own, or would trust more than your own if some conditions held (which the AI can itself check), then the AI can optimize your utility function under expected value under P′ rather than under your own beliefs, since you would prefer that.
Would it, though? It’s not evaluating actions on my future probutility, otherwise it would wirehead me. It’s evaluating actions on my present probutility. So now the answer seems to depend on whether we allow “tell me the right answer” as a primitive action, or if it is evaluated as “tell me [String],” which has low probutility.
But of course, if tell me the right answer is primitive, how do we stop “do the right thing” from being primitive, which lands us right back in the hot water of strong optimization of ‘utility’ this proposal was supposed to prevent? So I think it should evaluate the specific output, which has low probability(human), and therefore not tell you.
I’ll try and write up a proof that it can do what I think it can.