This is exactly why the bio team for WMDP decided to deliberately include distractors involving relatively less harmful stuff. We didn’t want to publicly publish a benchmark which gave a laser-focused “how to be super dangerous” score. We aimed for a fuzzier decision boundary.
This brought criticism from experts at the labs who said that the benchmark included too much harmless stuff. I still think the trade-off was worthwhile.
This is exactly why the bio team for WMDP decided to deliberately include distractors involving relatively less harmful stuff. We didn’t want to publicly publish a benchmark which gave a laser-focused “how to be super dangerous” score. We aimed for a fuzzier decision boundary. This brought criticism from experts at the labs who said that the benchmark included too much harmless stuff. I still think the trade-off was worthwhile.