Alignment isn’t required for high capability. So a self-improving AI wouldn’t solve it because it has no reason to.
This becomes obvious if you think about alignment as “doing what humans want” or “pursuing the values of humanity”. There’s no reason why an AI would do this.
Usually alignment is shorthand for alignment-with-humanity, which is a condition humanity cares about. This thread is about alignment-with-AI, which is what the AI that contemplates building other AIs or changing itself cares about.
A self-improving paperclipping AI has a reason to solve alignment-with-paperclipping, in which case it would succeed in improving itself into an AI that still cares about paperclipping. If its “improved” variant is misaligned with the original goal of paperclipping, the “improved” AI won’t care about paperclipping, leading to less paperclippling, which the original AI wouldn’t want to happen.
Alignment isn’t required for high capability. So a self-improving AI wouldn’t solve it because it has no reason to.
This becomes obvious if you think about alignment as “doing what humans want” or “pursuing the values of humanity”. There’s no reason why an AI would do this.
Usually alignment is shorthand for alignment-with-humanity, which is a condition humanity cares about. This thread is about alignment-with-AI, which is what the AI that contemplates building other AIs or changing itself cares about.
A self-improving paperclipping AI has a reason to solve alignment-with-paperclipping, in which case it would succeed in improving itself into an AI that still cares about paperclipping. If its “improved” variant is misaligned with the original goal of paperclipping, the “improved” AI won’t care about paperclipping, leading to less paperclippling, which the original AI wouldn’t want to happen.