I mean, would you modify yourself into a coherent EU-maximizing superintelligence with no alignment guarantees? If that option became available in real life, would you take it? Of course not. And our hypothetical capable-but-not-coherent AI is facing the exact same question.
Why no alignment guarantees and why modify yourself and not build separately? The concern is that even if a non-coherent AGI solves its own alignment problem correctly, builds an EU-maximizing superintelligence aligned with the non-coherent AGI, the utility of the resulting superintelligence is still not aligned with humanity.
So the less convenient question should be, “Would you build a coherent optimizer if you had all the alignment guarantees you would want, all the time in the world to make sure it’s done right?” A positive answer to that question given by first non-coherent AGIs supports relevance of coherent optimizers and their alignment.
Why no alignment guarantees and why modify yourself and not build separately? The concern is that even if a non-coherent AGI solves its own alignment problem correctly, builds an EU-maximizing superintelligence aligned with the non-coherent AGI, the utility of the resulting superintelligence is still not aligned with humanity.
So the less convenient question should be, “Would you build a coherent optimizer if you had all the alignment guarantees you would want, all the time in the world to make sure it’s done right?” A positive answer to that question given by first non-coherent AGIs supports relevance of coherent optimizers and their alignment.