The detonator going off is of equal value to the detonator not going off. But pushing the button is of no value to the AI.
You’re right that this elmininates the downside to the button being pushed, from the AI’s perspective, so the AI is more likely to indulge in behaviours that result in the button being pushed (such as defecting and not gaining control of the explosive setup as a priority). Thus it eliminates the potential of the explosives as a threat to keep in well behaved—it keeps them solely as a security for the human race.
If you want to preserve the threat aspect, the simplest would be to add a second batch of independent explosives, to whose detonation the AI is not indifferent, with the two batches being as seperate and as unconnected as possible. Then you would get both advantages in one.
The detonator going off is of equal value to the detonator not going off. But pushing the button is of no value to the AI.
You’re right that this elmininates the downside to the button being pushed, from the AI’s perspective, so the AI is more likely to indulge in behaviours that result in the button being pushed (such as defecting and not gaining control of the explosive setup as a priority). Thus it eliminates the potential of the explosives as a threat to keep in well behaved—it keeps them solely as a security for the human race.
If you want to preserve the threat aspect, the simplest would be to add a second batch of independent explosives, to whose detonation the AI is not indifferent, with the two batches being as seperate and as unconnected as possible. Then you would get both advantages in one.