I’ve been thinking about a similar thing a lot.
Consider a little superintelligent child who always wants to eat as much candy as possible over the course of the next ten minutes. Assume the child doesn’t ever care about what happens ten minutes from now.
This child won’t work very hard at any instrumental goals like self improvement and conquering the world to redirect resources towards candy production, since that would be a waste of time, even though it might maximize candy consumption in the long term.
AI alignment isn’t any easier here, the point of this is just to illustrate that instrumental convergence is far from given.
This doesn’t make complete sense to me, but you are going down a line of thought I recognize.
There are certainly stable utility functions which, while having some drawbacks, don’t result in dangerous behavior from superintelligences. Finding a good one doesn’t seem all that difficult.
The real nasty challenge is how to build a superintelligence that has the utility function we want it to have. If we could do this, then we could start by choosing an extremely conservative utility function and slowly and cautiously iterate towards a balance of safe and useful.