endoself comments on The mathematics of reduced impact: help needed

endoself 21 Feb 2012 0:00 UTC
0 points

Instead, we make an Oracle AI with an approximation to our utility function. Then, the AI will act so as to use its output to get us to accomplish its goals, which are only mostly aligned with ours.

That is an interpretation that directly contradicts the description given—it isn’t compatible with not caring about the future beyond an hour—or, for that matter, actually being an ‘oracle’ at all.

I was thinking of some of those extremely bad questions that are sometimes proposed to be asked of an oracle AI: “Why don’t we just ask it how to make a lot of money.”, etc. Paul’s example of asking it to give the output that gets us to press the reward button falls into the same category (unless I’m misinterpreting what he meant there?).