Yay Anthropic for making great security commitments! (Beyond its previous very-incomplete commitments here and here.) I wish other labs would do this!
But less yay because these ‘commitments’ are merely aspirational — they don’t necessarily describe what Anthropic is actually doing, and there’s no timeline or accountability.
(Anthropic describes some currently-implemented practices here, but my impression is they’re very inadequate.)
They do as far as I can tell commit to a fairly strong sort of “timeline” for implementing these things: before they scale to ASL-3 capable models (i.e. ones that pass their evals for autonomous capabilities or misuse potential).
I read it differently; in particular my read is that they aren’t currently implementing all of the ASL-2 security stuff (and they’re not promising to do all of the ASL-3 stuff before scaling to ASL-3). Clarity from Anthropic would be nice.
In “ASL-2 and ASL-3 Security Commitments,” they say things like “labs should” rather than “we will.”
Almost none of their security practices are directly visible from the outside, but whether they have a bug bounty program is. They don’t. But “Programs like bug bounties and vulnerability discovery should incentivize exposing flaws” is part of the ASL-2 security commitments.
I guess when they “publish a more comprehensive list of our implemented ASL-2 security measures” we’ll know more.
Yay Anthropic for making great security commitments! (Beyond its previous very-incomplete commitments here and here.) I wish other labs would do this!
But less yay because these ‘commitments’ are merely aspirational — they don’t necessarily describe what Anthropic is actually doing, and there’s no timeline or accountability.
(Anthropic describes some currently-implemented practices here, but my impression is they’re very inadequate.)
They do as far as I can tell commit to a fairly strong sort of “timeline” for implementing these things: before they scale to ASL-3 capable models (i.e. ones that pass their evals for autonomous capabilities or misuse potential).
I read it differently; in particular my read is that they aren’t currently implementing all of the ASL-2 security stuff (and they’re not promising to do all of the ASL-3 stuff before scaling to ASL-3). Clarity from Anthropic would be nice.
In “ASL-2 and ASL-3 Security Commitments,” they say things like “labs should” rather than “we will.”
Almost none of their security practices are directly visible from the outside, but whether they have a bug bounty program is. They don’t. But “Programs like bug bounties and vulnerability discovery should incentivize exposing flaws” is part of the ASL-2 security commitments.
I guess when they “publish a more comprehensive list of our implemented ASL-2 security measures” we’ll know more.