This concern seems relevant if (1) a discount factor is used in an RL setup (otherwise the systems seems as likely to be deceptively aligned with or without the intervention, in order to eventually take over the world), and (2) a decision about whether the system is safe for deployment is made based on its behavior during training.
As an aside, the following quote from the paper seems relevant here:
Ensuring copies of the states of early potential precursor AIs are preserved to later receive benefits would permit some separation of immediate safety needs and fair compensation.
This concern seems relevant if (1) a discount factor is used in an RL setup (otherwise the systems seems as likely to be deceptively aligned with or without the intervention, in order to eventually take over the world), and (2) a decision about whether the system is safe for deployment is made based on its behavior during training.
As an aside, the following quote from the paper seems relevant here: