Tor Økland Barstad comments on Half-baked AI Safety ideas thread

Tor Økland Barstad 24 Jun 2022 16:01 UTC
1 point
Perhaps experiments could make use of encryption in some way that prevented AGIs from doing/verifying work themselves, making it so that they would need to align the other AGI/AGIs. Encryption keys that only one AGI has could be necessary for doing and/or verifying work.

Could maybe set things up in such a way that one AGI knows it can get more reward if it tricks the other into approving faulty output.

Would need to avoid suffering sub-routines.