Concerning point 35 about playing AIs off against each other: I analyzed a particular scenario like this in a recent post and also came to the conclusion that cooperation between the AIs is the default outcome in many scenarios. However, in the last subsection of that post, I start thinking about some ways to prevent an acausal trade as Eliezer describes it here (committing to sharing the universe with any AI reviewing the code). The idea is roughly that the code and as much information as possible about the AI doing the checking will be deleted before the checked nanosystem is released. In this way, it’s not possible for the escaping AI to share half the universe with its checker, since not enough information about the checker and its goals is left in the universe to enact this compromise. This reduces the incentive of the checker to cooperate/hide alignment faults.
The post above describes various caveats and other things that could go wrong in such a setup. I tried my best at being pessimistic and pointing out various failure modes.
Concerning point 35 about playing AIs off against each other: I analyzed a particular scenario like this in a recent post and also came to the conclusion that cooperation between the AIs is the default outcome in many scenarios. However, in the last subsection of that post, I start thinking about some ways to prevent an acausal trade as Eliezer describes it here (committing to sharing the universe with any AI reviewing the code). The idea is roughly that the code and as much information as possible about the AI doing the checking will be deleted before the checked nanosystem is released. In this way, it’s not possible for the escaping AI to share half the universe with its checker, since not enough information about the checker and its goals is left in the universe to enact this compromise. This reduces the incentive of the checker to cooperate/hide alignment faults.
The post above describes various caveats and other things that could go wrong in such a setup. I tried my best at being pessimistic and pointing out various failure modes.