William_S comments on Claude 3.5 Sonnet

William_S 20 Jun 2024 22:06 UTC
14 points
6
It’s quite possible that anthropic has some internal definition of “not meaningfully advancing the capabilities frontier” that is compatible with this release. But imo they shouldn’t get any credit unless they explain it.
- dogiv 21 Jun 2024 4:03 UTC
  34 points
  11
  Parent
  I explicitly asked Anthropic whether they had a policy of not releasing models significantly beyond the state of the art. They said no, and that they believed Claude 3 was noticeably beyond the state of the art at the time of its release.
  - dogiv 21 Jun 2024 20:12 UTC
    7 points
    0
    Parent
    And to elaborate a little bit (based on my own understanding, not what they told me) their RSP sort of says the opposite. To avoid a “race to the bottom” they base the decision to deploy a model on what harm it can cause, regardless of what models other companies have released. So if someone else releases a model with potentially dangerous capabilities, Anthropic can’t/won’t use that as cover to release something similar that they wouldn’t have released otherwise. I’m not certain whether this is the best approach, but I do think it’s coherent.
    - Zach Stein-Perlman 21 Jun 2024 20:21 UTC
      9 points
      0
      Parent
      Yep:
      Note that ASLs are defined by risk relative to baseline, excluding other advanced AI systems. This means that a model that initially merits ASL-3 containment and deployment measures for national security reasons might later be reduced to ASL-2 if defenses against national security risks (such as biological or cyber defenses) advance, or if dangerous information becomes more widely available. However, to avoid a “race to the bottom”, the latter should not include the effects of other companies’ language models; just because other language models pose a catastrophic risk does not mean it is acceptable for ours to.
      Source
- Archimedes 21 Jun 2024 3:47 UTC
  3 points
  0
  Parent
  I can definitely imagine them plausibly believing they’re sticking to that commitment, especially with a sprinkle of motivated reasoning. It’s “only” incremental nudging the publicly available SOTA rather than bigger steps like GPT2 --> GPT3 --> GPT4.