If you are doing evals for CBRN capabilities, you are very much in the zone of terrorists killing billions of innocent people. Indeed, that’s practically the definition. There’s no citation, it’s just my personal experience while doing evals that are much to spicy to publish.
Of course, if you’re only doing evals for relatively tame proxy skills (e.g. WMDP) then probably you get less of this effect. I don’t have a quantification of the rates or specific datasets, just anecdata.
If you are doing evals for CBRN capabilities, you are very much in the zone of terrorists killing billions of innocent people. Indeed, that’s practically the definition. There’s no citation, it’s just my personal experience while doing evals that are much to spicy to publish.
Of course, if you’re only doing evals for relatively tame proxy skills (e.g. WMDP) then probably you get less of this effect. I don’t have a quantification of the rates or specific datasets, just anecdata.