Agree. I think Google DeepMind might actually be the most forthcoming about this kind of thing, e.g., see their Evaluating Frontier Models for Dangerous Capabilities report.
I thought that paper was just dangerous-capability evals, not safety-related metrics like adversarial robustness.
Agree. I think Google DeepMind might actually be the most forthcoming about this kind of thing, e.g., see their Evaluating Frontier Models for Dangerous Capabilities report.
I thought that paper was just dangerous-capability evals, not safety-related metrics like adversarial robustness.