I think the important insight you may be missing is that the AI, if intelligent enough to recursively self-improve, can predict what the modifications it makes will do (and if it can’t, then it doesn’t make that modification because creating an unpredictable child AI would be a bad move according to almost any utility function, even that of a paperclipper). And it evaluates the suitability of these modifications using its utility function. So assuming the seed AI is build with a sufficiently solid understanding of self-modification and what its own code is doing, it will more or less automatically work to create more powerful AIs whose actions will also be expected to fulfill the original utility function, no “fixed points” required.
There is a hypothetical danger region where an AI has sufficient intelligence to create a more powerful child AI, isn’t clever enough to predict the actions of AIs with modified utility functions, and isn’t self-aware enough to realize this and compensate by, say, not modifying the utility function itself. Obviously the space of possible minds is sufficiently large that there exist minds with this problem, but it probably doesn’t even make it into the top 10 most likely AI failure modes at the moment.
I think the important insight you may be missing is that the AI, if intelligent enough to recursively self-improve, can predict what the modifications it makes will do (and if it can’t, then it doesn’t make that modification because creating an unpredictable child AI would be a bad move according to almost any utility function, even that of a paperclipper). And it evaluates the suitability of these modifications using its utility function. So assuming the seed AI is build with a sufficiently solid understanding of self-modification and what its own code is doing, it will more or less automatically work to create more powerful AIs whose actions will also be expected to fulfill the original utility function, no “fixed points” required.
There is a hypothetical danger region where an AI has sufficient intelligence to create a more powerful child AI, isn’t clever enough to predict the actions of AIs with modified utility functions, and isn’t self-aware enough to realize this and compensate by, say, not modifying the utility function itself. Obviously the space of possible minds is sufficiently large that there exist minds with this problem, but it probably doesn’t even make it into the top 10 most likely AI failure modes at the moment.