Vaughn Papenhausen answers Is it rational to modify one’s utility function?

Vaughn Papenhausen Feb 5, 2022, 12:47 AM
4 points
0
The received wisdom in this community is that modifying one’s utility function is at least usually irrational. The classic source here is Steve Omohundro’s 2008 paper, “The Basic AI Drives,” and Nick Bostrom gives basically the same argument in Superintelligence, pp. 132-34. The argument is basically this: imagine you have an AI that is solely maximizing the number of paperclips that exist. Obviously, if it abandons that goal, there will be less paperclips than if it maintains that goal. And if it adds another goal, say maximizing staples, then this other goal will compete with the paperclip goal for resources, e.g. time, attention, steel, etc. So again, if it adds the staple goal, there will be less paperclips than if it doesn’t. So if it evaluates every option by h many paperclips result in expectation, then it will choose to maintain its paperclip goal unchanged. This argument isn’t mathematically rigorous, and allows that there may be special cases where changing one’s goal may be useful. But the thought is that, by default, changing one’s goal is detrimental from the perspective of one’s current goals.

As I said, though, there may be exceptions, at least for certain kinds of agents. Here’s an example. It seems as though, at least for humans, we’re more motivated to pursue our final goals directly than we are to pursue merely instrumental goals (which child do you think will read more: the one who intrinsically enjoys reading, or the one you pay $5 for every book they finish?). So, if a goal is particularly instrumentally useful, it may be useful to adopt it as a final goal in itself in order to increase your motivation to pursue it. For example, if your goal is to become a diplomat, but you find it extremely boring to read papers on foreign policy… well, first of all, I question why you want to become a diplomat if you’re not interested in foreign policy, but more importantly, you might be well-served to cultivate an intrinsic interest in foreign policy papers. This is a bit risky: if circumstances change so that it’s no longer as instrumentally useful, it may end up competing with your initial goals as described by the Bostrom/Omohundro argument. But it could work out that, at least some of the time, the expected value of changing your goal for this reason is positive.

Another paper to look at might be Steve Petersen’s paper, “Superintelligence as Superethical,” though I can’t summarize the argument for you off the top of my head.