Perhaps experiments could make use of encryption in some way that prevented AGIs from doing/verifying work themselves, making it so that they would need to align the other AGI/AGIs. Encryption keys that only one AGI has could be necessary for doing and/or verifying work.
Could maybe set things up in such a way that one AGI knows it can get more reward if it tricks the other into approving faulty output.
Perhaps experiments could make use of encryption in some way that prevented AGIs from doing/verifying work themselves, making it so that they would need to align the other AGI/AGIs. Encryption keys that only one AGI has could be necessary for doing and/or verifying work.
Could maybe set things up in such a way that one AGI knows it can get more reward if it tricks the other into approving faulty output.
Would need to avoid suffering sub-routines.