It is true that an AGI may want to create other AGIs and therefore may have to deal with both outer and inner alignment problems. Even if it just creates copies of itself initially, the copies may develop somewhat independently and become misaligned. They may even aggregate into organizations that have their own incentives and implied goals separate from any of its components. If it intends to create a more powerful successor or self-modify, the challenges it faces will be many of the same that we face in creating AGI at all.
This isn’t a cause for optimism.
That just makes the problem worse for us. A weakly superhuman AGI that doesn’t itself solve all alignment problems may create or become strongly superintelligent successors that don’t share any reasonable extrapolation of its previous goals. Thus they will be even more likely to be divergent from anything compatible with human flourishing than if it had solved alignment.
It is true that an AGI may want to create other AGIs and therefore may have to deal with both outer and inner alignment problems. Even if it just creates copies of itself initially, the copies may develop somewhat independently and become misaligned. They may even aggregate into organizations that have their own incentives and implied goals separate from any of its components. If it intends to create a more powerful successor or self-modify, the challenges it faces will be many of the same that we face in creating AGI at all.
This isn’t a cause for optimism.
That just makes the problem worse for us. A weakly superhuman AGI that doesn’t itself solve all alignment problems may create or become strongly superintelligent successors that don’t share any reasonable extrapolation of its previous goals. Thus they will be even more likely to be divergent from anything compatible with human flourishing than if it had solved alignment.