I thought that paper was just dangerous-capability evals, not safety-related metrics like adversarial robustness.
Current theme: default
Less Wrong (text)
Less Wrong (link)
I thought that paper was just dangerous-capability evals, not safety-related metrics like adversarial robustness.