I’m not convinced “want to modify their utility functions” is the perspective most useful. I think it might be more helpful to say that we each have multiple utility functions, which conflict to varying degrees and have voting power in different areas of the mind. I’ve had first-hand experience with such conflicts (as essentially everyone probably has, knowingly or not), and it feels like fighting yourself. I wish to describe a hypothetical example. “Do I eat that extra donut?”. Part of you wants the donut; the part feels like more of an instinct, a visceral urge. Part of you knows you’ll be ill afterwards, and will feel guilty about cheating your diet; this part feels more like “you”, it’s the part that thinks in words. You stand there and struggle, trying to make yourself walk away, as your hand reaches out for the donut. I’ve been in similar situations where (though I balked at the possible philosophical ramifications) I felt like if I had a button to make me stop wanting the thing, I’d push it—yet often it was the other function that won. I feel like if you gave an agent the ability to modify their utility functions, the one that would win depends on which one had access to the mechanism (do you merely think the thought? push a button?), and whether they understand what the mechanism means. (The word “donut” doesn’t evoke nearly as strong a reaction as a picture of a donut, for instance; your donut-craving subsystem doesn’t inherently understand the word.)
Contrarily, one might argue that cravings for donuts are more hardwired instincts than part of the “mind”, and so don’t count...but I feel like 1. finding a true dividing line is gonna be real hard, and 2. even that aside, I expect many/most people have goals localized in the same part of the mind that nevertheless are not internally consistent, and in some cases there may be reasonable sounding goals that turn out to be completely incompatible with more important goals. In such a case I could imagine an agent deciding it’s better to stop wanting the thing they can’t have.
I’m not convinced “want to modify their utility functions” is the perspective most useful. I think it might be more helpful to say that we each have multiple utility functions, which conflict to varying degrees and have voting power in different areas of the mind. I’ve had first-hand experience with such conflicts (as essentially everyone probably has, knowingly or not), and it feels like fighting yourself. I wish to describe a hypothetical example. “Do I eat that extra donut?”. Part of you wants the donut; the part feels like more of an instinct, a visceral urge. Part of you knows you’ll be ill afterwards, and will feel guilty about cheating your diet; this part feels more like “you”, it’s the part that thinks in words. You stand there and struggle, trying to make yourself walk away, as your hand reaches out for the donut. I’ve been in similar situations where (though I balked at the possible philosophical ramifications) I felt like if I had a button to make me stop wanting the thing, I’d push it—yet often it was the other function that won. I feel like if you gave an agent the ability to modify their utility functions, the one that would win depends on which one had access to the mechanism (do you merely think the thought? push a button?), and whether they understand what the mechanism means. (The word “donut” doesn’t evoke nearly as strong a reaction as a picture of a donut, for instance; your donut-craving subsystem doesn’t inherently understand the word.)
Contrarily, one might argue that cravings for donuts are more hardwired instincts than part of the “mind”, and so don’t count...but I feel like 1. finding a true dividing line is gonna be real hard, and 2. even that aside, I expect many/most people have goals localized in the same part of the mind that nevertheless are not internally consistent, and in some cases there may be reasonable sounding goals that turn out to be completely incompatible with more important goals. In such a case I could imagine an agent deciding it’s better to stop wanting the thing they can’t have.
If you literally have multiple UFs, you literally are multiple agents. Or you use a term with less formal baggage, like “preferences*.