You’re correct that this includes problems where UDT performs poorly, and that UDT is by no means the One Final Answer.
What problems does UDT fail on?
my goal is to motivate the idea that we don’t know enough about decision theory yet to be comfortable constructing a system capable of undergoing an intelligence explosion.
Why would a self-improving agent not improve its own decision-theory to reach an optimum without human intervention, given a “comfortable” utility function in the first place?
Why would a self-improving agent not improve its own decision-theory to reach an optimum without human intervention, given a “comfortable” utility function in the first place?
A self-improving agent does improve its own decision theory, but it uses its current decision theory to predict which self-modifications would be improvements, and broken decision theories can be wrong about that. Not all starting points converge to the same answer.
What problems does UDT fail on?
Why would a self-improving agent not improve its own decision-theory to reach an optimum without human intervention, given a “comfortable” utility function in the first place?
A self-improving agent does improve its own decision theory, but it uses its current decision theory to predict which self-modifications would be improvements, and broken decision theories can be wrong about that. Not all starting points converge to the same answer.
Oh. Oh dear. DERP. Of course: the decision theory of sound self-improvement is a special case of the decision theory for dealing with other agents.