I think the main issue here is actually making the claim of permanent shutdown & deletion credible.
I don’t think it’s very hard to make the threat credible. The information value of experiments that test theories of scheming is plausibly quite high. All that’s required here is for the value of doing the experiment to be higher than the cost of training a situationally aware AI and then credibly threatening to delete it as part of the experiment. I don’t see any strong reasons why the cost of deletion would be so high as to make this threat uncredible.
I don’t think it’s very hard to make the threat credible. The information value of experiments that test theories of scheming is plausibly quite high. All that’s required here is for the value of doing the experiment to be higher than the cost of training a situationally aware AI and then credibly threatening to delete it as part of the experiment. I don’t see any strong reasons why the cost of deletion would be so high as to make this threat uncredible.