The informal part of your opening sentence really hurts here. Humans don’t have time-consistent (or in many cases self-consistent) utility functions. It’s not clear whether AI could theoretically have such a thing, but let’s presume it’s possible.
The confusion comes in having a utility-maximizing framework to describe “what the agent wants”. If you want to change your utility function, that implies that you don’t want what your current utility function says you want. Which means it’s not actually your utility function.
You can add epicycles here—a meta-utility-function that describes what you want to want, probably at a different level of abstraction. That makes your question sensible, but also trivial—of course your meta-utility function wants to change your utility function to more closely match your meta-goals. But then you have to ask whether you’d ever want to change your meta-function. And you get caught recursing until your stack overflows.
Much simpler and more consistent to say “if you want to change it, it’s not your actual utility function”.
I don’t follow the scenario. If the AI is VNM-rational and has a utility function that is linear with number of paperclips created, then it doesn’t WANT to edit the utility function, because no other function maximizes paperclips.
Conversely, if an agent WANTS to modify it’s utility function, that implies it’s not actually the utility function.
Ok, so basically, we could make an AI that wants to maximize a variable called Utility and that AI might edit its code, but we probably would figure out a way to write it so that it always evaluates the decision on whether to modify its utility function according to its current utility function, so it never would—is that what you’re saying?
Also, maybe I’m conflating unrelated idea here—I’m not in the AI field—but I think I recall there being a tiling problem of trying to prove that an agent that makes a copy of itself wouldn’t change its utility function. If any VNM-rational agent wouldn’t want to change its utility function does that mean that the question is just whether the AI would make a mistake when creating its successor?
so basically, we could make an AI that wants to maximize a variable called Utility
Oh, maybe this is the confusion. It’s not a variable called Utility. It’s the actual true goal of the agent. We call it “utility” when analyzing decisions, and VNM-rational agents act as if they have a utility function over states of the world, but it doesn’t have to be external or programmable.
I’d taken your pseudocode as a shorthand for “design the rational agent such that what it wants is …”. It’s not literally a variable, nor a simple piece of code that non-simple code could change.
The informal part of your opening sentence really hurts here. Humans don’t have time-consistent (or in many cases self-consistent) utility functions. It’s not clear whether AI could theoretically have such a thing, but let’s presume it’s possible.
The confusion comes in having a utility-maximizing framework to describe “what the agent wants”. If you want to change your utility function, that implies that you don’t want what your current utility function says you want. Which means it’s not actually your utility function.
You can add epicycles here—a meta-utility-function that describes what you want to want, probably at a different level of abstraction. That makes your question sensible, but also trivial—of course your meta-utility function wants to change your utility function to more closely match your meta-goals. But then you have to ask whether you’d ever want to change your meta-function. And you get caught recursing until your stack overflows.
Much simpler and more consistent to say “if you want to change it, it’s not your actual utility function”.
Ok, so if we programmed an AI with something like:
Utility=NumberOfPaperClipsCreated
While True:{
TakeAction(ActionThatWouldMaximize(Utility))
}
Would that mean its Utility Function isn’t really NumberOfPaperClipsCreated? Would an AI programmed like that edit its own code?
I don’t follow the scenario. If the AI is VNM-rational and has a utility function that is linear with number of paperclips created, then it doesn’t WANT to edit the utility function, because no other function maximizes paperclips.
Conversely, if an agent WANTS to modify it’s utility function, that implies it’s not actually the utility function.
Utility function defines what “want” means.
Ok, so basically, we could make an AI that wants to maximize a variable called Utility and that AI might edit its code, but we probably would figure out a way to write it so that it always evaluates the decision on whether to modify its utility function according to its current utility function, so it never would—is that what you’re saying?
Also, maybe I’m conflating unrelated idea here—I’m not in the AI field—but I think I recall there being a tiling problem of trying to prove that an agent that makes a copy of itself wouldn’t change its utility function. If any VNM-rational agent wouldn’t want to change its utility function does that mean that the question is just whether the AI would make a mistake when creating its successor?
Oh, maybe this is the confusion. It’s not a variable called Utility. It’s the actual true goal of the agent. We call it “utility” when analyzing decisions, and VNM-rational agents act as if they have a utility function over states of the world, but it doesn’t have to be external or programmable.
I’d taken your pseudocode as a shorthand for “design the rational agent such that what it wants is …”. It’s not literally a variable, nor a simple piece of code that non-simple code could change.