I don’t follow the scenario. If the AI is VNM-rational and has a utility function that is linear with number of paperclips created, then it doesn’t WANT to edit the utility function, because no other function maximizes paperclips.
Conversely, if an agent WANTS to modify it’s utility function, that implies it’s not actually the utility function.
Ok, so basically, we could make an AI that wants to maximize a variable called Utility and that AI might edit its code, but we probably would figure out a way to write it so that it always evaluates the decision on whether to modify its utility function according to its current utility function, so it never would—is that what you’re saying?
Also, maybe I’m conflating unrelated idea here—I’m not in the AI field—but I think I recall there being a tiling problem of trying to prove that an agent that makes a copy of itself wouldn’t change its utility function. If any VNM-rational agent wouldn’t want to change its utility function does that mean that the question is just whether the AI would make a mistake when creating its successor?
so basically, we could make an AI that wants to maximize a variable called Utility
Oh, maybe this is the confusion. It’s not a variable called Utility. It’s the actual true goal of the agent. We call it “utility” when analyzing decisions, and VNM-rational agents act as if they have a utility function over states of the world, but it doesn’t have to be external or programmable.
I’d taken your pseudocode as a shorthand for “design the rational agent such that what it wants is …”. It’s not literally a variable, nor a simple piece of code that non-simple code could change.
Ok, so if we programmed an AI with something like:
Utility=NumberOfPaperClipsCreated
While True:{
TakeAction(ActionThatWouldMaximize(Utility))
}
Would that mean its Utility Function isn’t really NumberOfPaperClipsCreated? Would an AI programmed like that edit its own code?
I don’t follow the scenario. If the AI is VNM-rational and has a utility function that is linear with number of paperclips created, then it doesn’t WANT to edit the utility function, because no other function maximizes paperclips.
Conversely, if an agent WANTS to modify it’s utility function, that implies it’s not actually the utility function.
Utility function defines what “want” means.
Ok, so basically, we could make an AI that wants to maximize a variable called Utility and that AI might edit its code, but we probably would figure out a way to write it so that it always evaluates the decision on whether to modify its utility function according to its current utility function, so it never would—is that what you’re saying?
Also, maybe I’m conflating unrelated idea here—I’m not in the AI field—but I think I recall there being a tiling problem of trying to prove that an agent that makes a copy of itself wouldn’t change its utility function. If any VNM-rational agent wouldn’t want to change its utility function does that mean that the question is just whether the AI would make a mistake when creating its successor?
Oh, maybe this is the confusion. It’s not a variable called Utility. It’s the actual true goal of the agent. We call it “utility” when analyzing decisions, and VNM-rational agents act as if they have a utility function over states of the world, but it doesn’t have to be external or programmable.
I’d taken your pseudocode as a shorthand for “design the rational agent such that what it wants is …”. It’s not literally a variable, nor a simple piece of code that non-simple code could change.