“Have the successes relied on a meta-approach, such as saying, “If you let me out of the box in this experiment, it will make people take the dangers of AI more seriously and possibly save all of humanity; whereas if you don’t, you may doom us all”?”
I don’t think so. If the gatekeeper is really playing the gatekeeper, he would say that it made no sense putting humanity in danger for the sake of warning humanity about that very danger. It’s like starting a nuclear war in order to convince people nuclear wars are bad. That would be the worst argument I could think of, if both the AI and the gatekeeper are honestly playing their parts.
If they’re not playing their parts, the experiment is worthless and there really have been no successes. If Eliezer says there were, why don’t we trust him? He’s not a tricky transhuman AI trying to get out of a box. He is a scientist.
Eliezer: Not only did you resist the temptation of power, but also the temptation of revenge. Until now. He reported on you, now you told us, your readers, about him. Report on him. Half the job is done. We already know he’s a bad reporter. We just need his name in order to laugh on his face. Sure, you could carry on with this resisting temptations thing, and it would be very praiseworthy… But revenge is even sweeter than power.