There’s a result that’s almost a theorem, which is that an agent that is an expected utility maximiser, is an agent that is stable under self-modification (or the creation of successor sub-agents).
Of course, this needs to be for “reasonable” utility, where no other agent cares about the internal structure of the agent (just its decisions), where the agent is not under any “social” pressure to make itself into something different, where the boundedness of the agent itself doesn’t affect its motivations, and where issues of “self-trust” and acausal trade don’t affect it in relevant ways, etc...
I know you aren’t trying to list all caveats but I think there are others that are other important ways this can go wrong. An agent may not be able to tell that a self-modification will be successful but it may have a high expected utility even as there’s some risk of changing one’s preferences.
I know you aren’t trying to list all caveats but I think there are others that are other important ways this can go wrong. An agent may not be able to tell that a self-modification will be successful but it may have a high expected utility even as there’s some risk of changing one’s preferences.