Do you have any plans to release the instructions in RefusalBench? I understand the reasons to not provide many details of your underlying technique, but given the limitations you highlight with AdvBench, wouldn’t access to RefusalBench provide safety researchers with a better benchmark to test new models on?
Do you have any plans to release the instructions in RefusalBench? I understand the reasons to not provide many details of your underlying technique, but given the limitations you highlight with AdvBench, wouldn’t access to RefusalBench provide safety researchers with a better benchmark to test new models on?