Zach Stein-Perlman comments on Zach Stein-Perlman’s Shortform

Zach Stein-Perlman 6 Jun 2024 17:00 UTC
32 points
12
Securing model weights is underrated for AI safety. (Even though it’s very highly rated.) If the leading lab can’t stop critical models from leaking to actors that won’t use great deployment safety practices, approximately nothing else matters. Safety techniques would need to be based on properties that those actors are unlikely to reverse (alignment, maybe unlearning) rather than properties that would be undone or that require a particular method of deployment (control techniques, RLHF harmlessness, deployment-time mitigations).
However hard the make a critical model you can safely deploy problem is, the make a critical model that can safely be stolen problem is… much harder.
What links here?
- Cam's comment on Human takeover might be worse than AI takeover by Tom Davidson (11 Jan 2025 16:14 UTC; 3 points)
- habryka 6 Jun 2024 19:52 UTC
  9 points
  −6
  Parent
  None of the actors who seem currently likely to me to be to deploy highly capable systems seem to me like they will do anything except approximately scaling as fast as they can. I do agree that proliferation is still bad simply because you get more samples from the distribution, but I don’t think that changes the probabilities that drastically for me (I am still in favor of securing model weights work, especially in the long run).
  Separately, I think it’s currently pretty plausible that model weight leaks will substantially reduce the profit of AI companies by reducing their moat, and that has an effect size that seems plausible larger than the benefits of non-proliferation.
  - Linch 6 Jun 2024 22:59 UTC
    3 points
    1
    Parent
    My central story is that AGI development will eventually be taken over by governments, in more or less subtle ways. So the importance of securing model weights now is mostly about less scrupulous actors having less of a headstart during the transition/after a governmental takeover.
    - Ebenezer Dukakis 7 Jun 2024 1:03 UTC
      1 point
      0
      Parent
      IMO someone should consider writing a “how and why” post on nationalizing AI companies. It could accomplish a few things:
      
      Ensure there’s a reasonable plan in place for nationalization. That way if nationalization happens, we can decrease the likelihood of it being controlled by Donald Trump with few safeguards, or something like that. Maybe we could take inspiration from a less partisan organization like the Federal Reserve.
      
      Scare off investors. Just writing the post and having it be discussed a lot could scare them.
      
      Get AI companies on their best behavior. Maybe Sam Altman would finally be pushed out if congresspeople made him the poster child for why nationalization is needed.
      - Akash 7 Jun 2024 1:29 UTC
        7 points
        5
        Parent
        @Ebenezer Dukakis I would be even more excited about a “how and why” post for internationalizing AGI development and spelling out what kinds of international institutions could build + govern AGI.
        What links here?
        Ebenezer Dukakis's comment on chinscratch’s Quick takes by Ebenezer Dukakis (EA Forum; 8 Jul 2024 20:27 UTC; 3 points)
      - Aaron_Scher 11 Nov 2024 4:47 UTC
        1 point
        0
        Parent
        There is now some work in that direction: https://forum.effectivealtruism.org/posts/47RH47AyLnHqCQRCD/soft-nationalization-how-the-us-government-will-control-ai
  - Ebenezer Dukakis 7 Jun 2024 0:55 UTC
    1 point
    0
    Parent
    
    Separately, I think it’s currently pretty plausible that model weight leaks will substantially reduce the profit of AI companies by reducing their moat, and that has an effect size that seems plausible larger than the benefits of non-proliferation.
    
    What sort of leaks are we talking about? I doubt a sophisticated hacker is going to steal weights from OpenAI just to post them on 4chan. And I doubt OpenAI’s weights will be stolen by anyone except a sophisticated hacker.
    
    If you want to reduce the incentive to develop AI, how about passing legislation to tax it really heavily? That is likely to have popular support due to the threat of AI unemployment. And it reduces the financial incentive to invest in large training runs. Even just making a lot of noise about such legislation creates uncertainty for investors.
- Tenoke 6 Jun 2024 22:16 UTC
  7 points
  4
  Parent
  I think you are overrating it. Biggest concern comes from whomever trains a model that passes some treshold in the first place. Not from a model that one actor has been using for a while getting leaked to another actor. The bad actor who got access to the leak is always going to be behind in multiple ways in this scenario.
  - Rebecca 7 Jun 2024 7:27 UTC
    1 point
    0
    Parent
    The weights could be stolen as soon as the model is trained though
- ryan_greenblatt 6 Jun 2024 18:00 UTC
  4 points
  0
  Parent
  
  If the leading lab can’t stop critical models from leaking to actors that won’t use great deployment safety practices, approximately nothing else matters.
  
  This seems somewhat overstated. You might hope that you can get the safety tax sufficiently low that you can just do full competition (e.g. even though there are rogue AIs, you just compete with this rogue AIs for power). This also requires offense-defense imbalance to not be too bad.
  
  I overall agree that securing model weights in underrated and that it is plausibly the most important thing on current margins.
  
  In principle, if reasonable actors start with a high fraction of resources (e.g. compute), then you might hope that they can keep that fraction of power (in expectation at least).
  
  See also “The strategy-stealing assumption”. But also What does it take to defend the world against out-of-control AGIs?.
- quila 7 Jun 2024 3:52 UTC
  2 points
  2
  Parent
  Commenting to note that I think this quote is locally-invalid:
  If the leading lab can’t stop critical models from leaking to actors that won’t use great deployment safety practices, approximately nothing else matters
  There are other disjunctive problems with the world which are also individually-sufficient for doom^[1], in which case each of them matter a lot, in absence of some fundamental solution to all of them.
  1. ^
    (e.g lack of superintelligence-alignment/steerability progress)
- ozziegooen 6 Jun 2024 18:54 UTC
  2 points
  2
  Parent
  Minor point, but I think we might have some time here. Securing model weights becomes more important as models become better, but better models could also help us secure model weights (would help us code, etc).