Zach Stein-Perlman comments on Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust

Zach Stein-Perlman 19 Sep 2023 20:07 UTC
−1 points
0
Yay Anthropic for making great security commitments! (Beyond its previous very-incomplete commitments here and here.) I wish other labs would do this!
But less yay because these ‘commitments’ are merely aspirational — they don’t necessarily describe what Anthropic is actually doing, and there’s no timeline or accountability.
(Anthropic describes some currently-implemented practices here, but my impression is they’re very inadequate.)
- Hjalmar_Wijk 19 Sep 2023 20:59 UTC
  9 points
  2
  Parent
  They do as far as I can tell commit to a fairly strong sort of “timeline” for implementing these things: before they scale to ASL-3 capable models (i.e. ones that pass their evals for autonomous capabilities or misuse potential).
  - Zach Stein-Perlman 19 Sep 2023 21:09 UTC
    2 points
    −4
    Parent
    I read it differently; in particular my read is that they aren’t currently implementing all of the ASL-2 security stuff (and they’re not promising to do all of the ASL-3 stuff before scaling to ASL-3). Clarity from Anthropic would be nice.
    In “ASL-2 and ASL-3 Security Commitments,” they say things like “labs should” rather than “we will.”
    Almost none of their security practices are directly visible from the outside, but whether they have a bug bounty program is. They don’t. But “Programs like bug bounties and vulnerability discovery should incentivize exposing flaws” is part of the ASL-2 security commitments.
    I guess when they “publish a more comprehensive list of our implemented ASL-2 security measures” we’ll know more.