habryka comments on We’re Not Ready: thoughts on “pausing” and responsible scaling policies

habryka 28 Oct 2023 1:19 UTC
21 points
6
the security conditions could trigger a pause all on their own
I don’t understand how this is possible. The RSP appendix has the list of security conditions, and they are just a checklist of things that Anthropic is planning to do and can just implement whenever they want. It’s not cheap for them to implement it, but I don’t see any real circumstance where they fail to implement the security conditions in a way that would force them to pause.
Like, I agree that some of these commitments are costly, but I don’t see how there is any world where Anthropic would like to continue scaling but finds itself incapable of doing so, which is what I would consider a “pause” to mean. Like, they can just implement their checklist of security requirements and then go ahead.
Maybe this is quibbling over semantics, but it does really feels quite qualitatively different to me. When OpenAI said that they would spend some substantial fraction of their compute on “Alignment Research” while they train their next model, I think it would be misleading to say “OpenAI has committed to conditionally pausing model scaling”.
What links here?
- Vaniver's comment on Thoughts on the AI Safety Summit company policy requests and responses by So8res (1 Nov 2023 21:36 UTC; 18 points)
- Akash's comment on Thoughts on the AI Safety Summit company policy requests and responses by So8res (1 Nov 2023 18:56 UTC; 5 points)
- evhub 28 Oct 2023 1:32 UTC
  3 points
  −1
  Parent
  I mean, I agree that humanity theoretically knows how to implement these sorts of security commitments, so the current conditions should always be possible for Anthropic to unblock with enough time and effort, but the commitment to the sequencing that they have to happen before Anthropic has a model that is ASL-3 means that there are situations where Anthropic commits to pause scaling until the security commitments are met. I agree with you that this is a relatively weak commitment in terms of a scaling pause, though to be fair I don’t actually think simply having (but not deploying) a just-barely-ASL-3 model poses much of a risk, so I think it does make sense from a risk-based perspective why most of the commitments are around deployment and security. That being said, even if a just-barely-ASL-3 model doesn’t pose an existential risk, so long as ASL-3 is defined only with a lower bound rather than also an upper bound, it’s obviously the case that eventually it will contain models that pose a potential existential risk, so I agree that a lot is tied up in the upcoming definition of ASL-4. Regardless, it is still the case that Anthropic has already committed to a scaling pause under certain circumstances.
  - habryka 28 Oct 2023 1:37 UTC
    12 points
    4
    Parent
    Regardless, it is still the case that Anthropic has already committed to a scaling pause under certain circumstances.
    I disagree that this is an accurate summary, or like, it’s only barely denotatively true but not connotatively.
    I do think it’s probably best to let this discussion rest, not because it’s not important, but because I do think actually resolving this kind of semantic dispute in public comments like this is really hard, and I think it’s unlikely either of us will change their mind here, and we’ve both made our points. I appreciate you responding to my comments.
    - evhub 28 Oct 2023 19:22 UTC
      16 points
      4
      Parent
      I think that there’s a reasonable chance that the current security commitments will lead Anthropic to pause scaling (though I don’t know whether Anthropic would announce publicly if they paused internally). Maybe a Manifold market on this would be a good idea.
      - habryka 29 Oct 2023 3:48 UTC
        9 points
        0
        Parent
        That seems cool! I made a market here:
        
        Feel free to suggest edits about the operationalization or other things before people start trading.
        evhub 29 Oct 2023 5:53 UTC
        5 points
        0
        Parent
        Looks good—the only thing I would change is that I think this should probably resolve in the negative only once Anthropic has reached ASL-4, since only then will it be clear whether at any point there was a security-related pause during ASL-3.
        habryka 29 Oct 2023 6:57 UTC
        4 points
        0
        Parent
        That seems reasonable. Edited the description (I can’t change when trading on the market closes, but I think that should be fine).