S-risks don’t happen by default; misaligned AGI does. The chance of lab escape of misaligned AGI and deliberately-s-risky AGI is the same (let’s assume) but this chance is tiny compared to the benefits (in the case of testing our alignment and interpretability techniques) and huge compared to the benefits (in the case of testing our s-risk-reduction techniques).
Analogy to bio gain of function: The reason why it’s a bad idea to create novel pathogens in a lab is that they probably won’t appear at all if you don’t create them. If we were instead in a world where giant tech companies were busy creating novel pathogens designed specifically to kill as many people as possible, and injecting them into the water supply, then yes, it would make sense to do gain of function research in a desperate attempt to devise countermeasures in the brief window before someone else does it anyway.
Is this your position: there is no acceptable reason to deliberately optimize for s-risky things like sadism. And doing so to red-team s-risk detection is obviously madness. But possibly red-teaming conventional misalignment which would simply kill everyone if the absolute worst happened and is the default anyway maybe makes some sense?
S-risks don’t happen by default; misaligned AGI does. The chance of lab escape of misaligned AGI and deliberately-s-risky AGI is the same (let’s assume) but this chance is tiny compared to the benefits (in the case of testing our alignment and interpretability techniques) and huge compared to the benefits (in the case of testing our s-risk-reduction techniques).
Analogy to bio gain of function: The reason why it’s a bad idea to create novel pathogens in a lab is that they probably won’t appear at all if you don’t create them. If we were instead in a world where giant tech companies were busy creating novel pathogens designed specifically to kill as many people as possible, and injecting them into the water supply, then yes, it would make sense to do gain of function research in a desperate attempt to devise countermeasures in the brief window before someone else does it anyway.
Is this your position: there is no acceptable reason to deliberately optimize for s-risky things like sadism. And doing so to red-team s-risk detection is obviously madness. But possibly red-teaming conventional misalignment which would simply kill everyone if the absolute worst happened and is the default anyway maybe makes some sense?
I’m not sure what you are getting at. Maybe? What you said is not quite how I’d put it, but seems similar at least.