Is this your position: there is no acceptable reason to deliberately optimize for s-risky things like sadism. And doing so to red-team s-risk detection is obviously madness. But possibly red-teaming conventional misalignment which would simply kill everyone if the absolute worst happened and is the default anyway maybe makes some sense?
Is this your position: there is no acceptable reason to deliberately optimize for s-risky things like sadism. And doing so to red-team s-risk detection is obviously madness. But possibly red-teaming conventional misalignment which would simply kill everyone if the absolute worst happened and is the default anyway maybe makes some sense?
I’m not sure what you are getting at. Maybe? What you said is not quite how I’d put it, but seems similar at least.