If we make a model of a specific human, – for example, morally sane and rationally educated person with an excellent understanding of all said above, he could choose the right level self-improving, as he will understand dangers of becoming too much instrumental goals orientated agent. I don’t know any such person in real life, btw.
Convergent instrumental goals would make agent-like things become agents if they can self-modify (humans can’t do this to any strong extent).
If we make a model of a specific human, – for example, morally sane and rationally educated person with an excellent understanding of all said above, he could choose the right level self-improving, as he will understand dangers of becoming too much instrumental goals orientated agent. I don’t know any such person in real life, btw.