Conditioning on yourself deeming it optimal to make a metaphorical omelet by breaking metaphorical eggs, metaphorical eggs will deem it less optimal to remain vulnerable to metaphorical breakage by you than if you did not deem it optimal to make a metaphorical omelet by breaking metaphorical eggs; therefore, deeming it optimal to break metaphorical eggs in order to make a metaphorical omelet can increase the difficulty you find in obtaining omelet-level utility.
Correct. However, the method I proposed does not involve redefining one’s utility function, as it leaves terminal values unchanged. It simply recognizes that certain methods of achieving one’s pre-existing terminal values are better than others, which leaves the utility function unaffected (it only alters instrumental values).
The method I proposed is similar to pre-commitment for a causal decision theorist on a Newcomb-like problem. For such an agent, “locking out” future decisions can improve expected utility without altering terminal values. Likewise, a decision theory that fully absorbs such outcome-improving “lockouts” so that it outputs the same actions without explicit pre-commitment can increase its expected utility for the same utility function.
Conditioning on yourself deeming it optimal to make a metaphorical omelet by breaking metaphorical eggs, metaphorical eggs will deem it less optimal to remain vulnerable to metaphorical breakage by you than if you did not deem it optimal to make a metaphorical omelet by breaking metaphorical eggs; therefore, deeming it optimal to break metaphorical eggs in order to make a metaphorical omelet can increase the difficulty you find in obtaining omelet-level utility.
Many metaphorical eggs are not [metaphorical egg]::Utility maximizing agents.
True, and to the extent that is not the case, the mechanism I specified would not activate.
Redefining one’s own utility function so as to make it easier to achieve is the road that leads to wireheading.
Correct. However, the method I proposed does not involve redefining one’s utility function, as it leaves terminal values unchanged. It simply recognizes that certain methods of achieving one’s pre-existing terminal values are better than others, which leaves the utility function unaffected (it only alters instrumental values).
The method I proposed is similar to pre-commitment for a causal decision theorist on a Newcomb-like problem. For such an agent, “locking out” future decisions can improve expected utility without altering terminal values. Likewise, a decision theory that fully absorbs such outcome-improving “lockouts” so that it outputs the same actions without explicit pre-commitment can increase its expected utility for the same utility function.