Because it was not trained using reinforcement learning and doesn’t have a utility function, which means that it won’t face problems like mesa-optimisation
I think this is at least a non-obvious claim. In principle, it is conceivable that mesa-optimisation can occur outside of RL. There could be an agent/optimizer in (highly advanced, future) predictive models, even if the system does not really have a base objective. In this case, it might be better to think in terms of training stories rather than inner+outer alignment. Furthermore, there could still be issues with gradient hacking.
I think this is at least a non-obvious claim. In principle, it is conceivable that mesa-optimisation can occur outside of RL. There could be an agent/optimizer in (highly advanced, future) predictive models, even if the system does not really have a base objective. In this case, it might be better to think in terms of training stories rather than inner+outer alignment. Furthermore, there could still be issues with gradient hacking.