The AGI will be much more powerful than the agents.
It’s a security issue, in general a mesa-optimizer is not intended or expected to be there-in-particular at all (apart from narrow definitions of inner alignment describing more tractable setups, which won’t capture the actually worrying mesa-optimization that happens inside architectural black boxes, see point 17). Given enough leeway, it might get away with setting up steganographic cognition and persist in influencing its environment, even in the more dignified case when there actually is any monitoring for mesa-optimizers inside black boxes.
It’s a security issue, in general a mesa-optimizer is not intended or expected to be there-in-particular at all (apart from narrow definitions of inner alignment describing more tractable setups, which won’t capture the actually worrying mesa-optimization that happens inside architectural black boxes, see point 17). Given enough leeway, it might get away with setting up steganographic cognition and persist in influencing its environment, even in the more dignified case when there actually is any monitoring for mesa-optimizers inside black boxes.