AI risk is the risk of failure of civilization’s inner alignment, whether the unaligned mesa-optimizers are agenty or not. Solving inner alignment in an AGI is more urgent than solving AI risk (civilizational inner alignment), because an aligned AGI without its own inner alignment problems can help with solving AI risk. But similarly, if an aligned AGI can function despite having its own inner alignment risks (but not yet an active inner alignment failure), it might help with solving its own inner alignment risk, and this seems more urgent than for such AGI to solve AI risk (civilizational inner alignment).
In the CAIS setting, there is no clear distinction between mesa-optimizers within civilization and mesa-optimizers within an AGI built this way, so the two inner alignment problems are even closer, solving both might be a result of the same activity.
AI risk is the risk of failure of civilization’s inner alignment, whether the unaligned mesa-optimizers are agenty or not. Solving inner alignment in an AGI is more urgent than solving AI risk (civilizational inner alignment), because an aligned AGI without its own inner alignment problems can help with solving AI risk. But similarly, if an aligned AGI can function despite having its own inner alignment risks (but not yet an active inner alignment failure), it might help with solving its own inner alignment risk, and this seems more urgent than for such AGI to solve AI risk (civilizational inner alignment).
In the CAIS setting, there is no clear distinction between mesa-optimizers within civilization and mesa-optimizers within an AGI built this way, so the two inner alignment problems are even closer, solving both might be a result of the same activity.