tlevin comments on tlevin’s Shortform

tlevin Aug 29, 2024, 9:43 PM
16 points
−9
[reposting from Twitter, lightly edited/reformatted] Sometimes I think the whole policy framework for reducing catastrophic risks from AI boils down to two core requirements—transparency and security—for models capable of dramatically accelerating R&D.
If you have a model that could lead to general capabilities much stronger than human-level within, say, 12 months, by significantly improving subsequent training runs, the public and scientific community have a right to know this exists and to see at least a redacted safety case; and external researchers need to have some degree of red-teaming access. Probably various other forms of transparency would be useful too. It feels like this is a category of ask that should unite the “safety,” “ethics,” and “accelerationist” communities?
And the flip side is that it’s very important that a model capable of kicking off that kind of dash to superhuman capabilities not get stolen/exfiltrated, such that you don’t wind up with multiple actors facing enormous competitive tradeoffs to rush through this process.
These have some tradeoffs, especially as you approach AGI—e.g. if you develop a system that can do 99% of foundation model training tasks and your security is terrible you do have some good reasons not to immediately announce it—but not if we make progress on either of these before then, IMO. What the Pareto Frontier of transparency and security looks like, and where we should land on that curve, seems like a very important research agenda.
If you’re interested in moving the ball forward on either of these, my colleagues and I would love to see your proposal and might fund you to work on it!
- habryka Aug 29, 2024, 10:52 PM
  4 points
  −6
  Parent
  And the flip side is that it’s very important that a model capable of kicking off that kind of dash to superhuman capabilities not get stolen/exfiltrated, such that you don’t wind up with multiple actors facing enormous competitive tradeoffs to rush through this process.
  Is it? My sense is the race dynamics get worse if you are worried that your competitor has access to a potentially pivotal model but you can’t verify that because you can’t steal it. My guess is the best equilibrium is major nations being able to access competing models.
  Also, at least given present compute requirements, a smaller actor stealing a model is not that dangerous, since you need to invest hundreds of millions into compute to use the model for dangerous actions, which is hard to do secretly (though to what degree dangerous inference will cost a lot is something I am quite confused about).
  In general I am not super confident here, but I at least really don’t know what the sign of hardening models against exfiltration with regards to race dynamics is.
  - ryan_greenblatt Aug 29, 2024, 11:03 PM
    4 points
    0
    Parent
    
    My sense is the race dynamics get worse if you are worried that your competitor has access to a potentially pivotal model but you can’t verify that because you can’t steal it. My guess is the best equilibrium is major nations being able to access competing models.
    
    What about limited API access to all actors for verification (aka transparency) while still having security?
    - habryka Aug 29, 2024, 11:09 PM
      7 points
      0
      Parent
      It’s really hard to know that your other party is giving you API access to their most powerful model. If you could somehow verify that the API you are accessing is indeed directly hooked up to their most powerful model, and that the capabilities of that model aren’t being intentionally hobbled to deceive you, then I do think this gets you a lot of the same benefit.
      Some of the benefit is still missing though. I think lack of moats is a strong disincentive to develop technology, and so in a race scenario you might be a lot less tempted to make a mad sprint towards AGI if you think your opponents can catch up almost immediately, and so you might end up with substantial timeline-accelerating effects by enabling better moats.
      I do think the lack-of-moat benefit is smaller than the verification benefit.
      - ryan_greenblatt Aug 30, 2024, 1:29 AM
        4 points
        0
        Parent
        I think it should be possible to get a good enough verification regime in practice with considerable effort. It’s possible that sufficiently good verification occurs by default due to spies.
        
        I agree it there will potentially be a lot of issues downstream of verification issues by default.
        
        I think lack of moats is a strong disincentive to develop technology, and so in a race scenario you might be a lot less tempted to make a mad sprint towards AGI if you think your opponents can catch up almost immediately
        
        Hmm, this isn’t really how I model the situation with respect to racing. From my perspective, the question isn’t “security or no security”, but is instead “when will you have extreme security”.
        
        (My response might overlap with tlevin’s, I’m not super sure.)
        
        Here’s an example way things could go:
        
        An AI lab develops a model that begins to accelerate AI R&D substantially (say 10x) while having weak security. This model was developed primarily for commercial reasons and the possibility of it being stolen isn’t a substantial disincentive in practice.
        This model is immediately stolen by China.
        Shortly after this, USG secures the AI lab.
        Now, further AIs will be secure, but to stay ahead of China which has substantially accelerated AI R&D and other AI work, USG races to AIs which are much smarter than humans.
        
        In this scenario, if you had extreme security ready to go earlier, then the US would potentially have a larger lead and better negotiating position. I think this probably gets you longer delays prior to qualitatively wildly superhuman AIs in practice.
        
        There is a case that if you don’t work on extreme security in advance, then there will naturally be a pause to implement this. I’m a bit skeptical of this in practice, especially in short timelines. I also think that the timing of this pause might not be ideal—you’d like to pause when you already have transformative AI rather than before.
        
        Separately, if you imagine that USG is rational and at least somewhat aligned, then I think security looks quite good, though I can understand why you wouldn’t buy this.
        habryka Aug 30, 2024, 1:46 AM
        7 points
        11
        Parent
        Hmm, this isn’t really how I model the situation with respect to racing. From my perspective, the question isn’t “security or no security”
        Interesting, I guess my model is that the default outcome (in the absence of heroic efforts to the contrary) is indeed “no security for nation state attackers”, which as far as I can tell is currently the default for practically everything that is developed using modern computing systems. Getting to a point where you can protect something like the weights of an AI model from nation state actors would be extraordinarily difficult and an unprecedented achievement in computer security, which is why I don’t expect it to happen (even as many actors would really want it to happen).
        My model of cybersecurity is extremely offense-dominated for anything that requires internet access or requires thousands of people to have access (both of which I think are quite likely for deployed weights).
      - tlevin Aug 29, 2024, 11:37 PM
        3 points
        0
        Parent
        The “how do we know if this is the most powerful model” issue is one reason I’m excited by OpenMined, who I think are working on this among other features of external access tools
        habryka Aug 29, 2024, 11:53 PM
        2 points
        0
        Parent
        Interesting. I would have to think harder about whether this is a tractable problem. My gut says it’s pretty hard to build confidence here without leaking information, but I might be wrong.
  - tlevin Aug 29, 2024, 11:35 PM
    3 points
    0
    Parent
    If probability of misalignment is low, probability of human+AI coups (including e.g. countries invading each other) is high, and/or there aren’t huge offense-dominant advantages to being somewhat ahead, you probably want more AGI projects, not fewer. And if you need a ton of compute to go from an AI that can do 99% of AI R&D tasks to an AI that can cause global catastrophe, then model theft is less of a factor. But the thing I’m worried about re: model theft is a scenario like this, which doesn’t seem that crazy:
    Company/country X has an AI agent that can ~~do 99%~~ [edit: let’s say “automate 90%”] of AI R&D tasks, call it Agent-GPT-7, and enough of a compute stock to have that train a significantly better Agent-GPT-8 in 4 months at full speed ahead, which can then train a basically superintelligent Agent-GPT-9 in another 4 months at full speed ahead. (Company/country X doesn’t know the exact numbers, but their 80% CI is something like 2-8 months for each step; company/country Y has less info, so their 80% CI is more like 1-16 months for each step.)
    The weights for Agent-GPT-7 are available (legally or illegally) to company/country Y, which is known to company/country X.
    Y has, say, a fifth of the compute. So each of those steps will take 20 months. Symmetrically, company/country Y thinks it’ll take 10-40 months and company/country X thinks it’s 5-80.
    Once superintelligence is in sight like this, both company/country X and Y become very scared of the other getting it first—in the country case, they are worried it will undermine nuclear deterrence, upend their political system, basically lead to getting taken over by the other. The relevant decisionmakers think this outcome is better than extinction, but maybe not by that much, whereas getting superintelligence before the other side is way better. In the company case, it’s a lot less intense, but they still would much rather get superintelligence than their arch-rival CEO.
    So, X thinks they have anywhere from 5-80 months before Y has superintelligence, and Y thinks they have 1-16 months. So X and Y both think it’s easily possible, well within their 80% CI, that Y beats X.
    X and Y have no reliable means of verifying a commitment like “we will spend half our compute on safety testing and alignment research.”
    If these weights were not available, Y would have a similarly good system in 18 months, 80% CI 12-24.
    So, had the weights not been available to Y, X would be confident that it had 12 + 5 months to manage a capabilities explosion that would have happened in 8 months at full speed; it can spend >half of its compute on alignment/safety/etc, and it has 17 rather than 5 months of serial time to negotiate with Y, possibly develop some verification methods and credible mechanisms for benefit/power-sharing, etc. If various transparency reforms have been implemented, such that the world is notified in ~real-time that this is happening, there would be enormous pressure to do so; I hope and think it will seem super illegitimate to pursue this kind of power without these kinds of commitments. I am much more worried about X not doing this and instead just trying to grab enormous amounts of power if they’re doing it all in secret.
    [Also: I just accidentally went back a page by command-open bracket in an attempt to get my text out of bullet format and briefly thought I lost this comment; thank you in your LW dev capacity for autosave draft text, but also it is weirdly hard to get out of bullets]
    - Nathan Helm-Burger Aug 30, 2024, 1:12 AM
      3 points
      0
      Parent
      I expect that having a nearly-AGI-level AI, something capable of mostly automating further ML research, means the ability to rapidly find algorithmic improvements that result in:
      
      1. drastic reductions in training cost for an equivalently strong AI.
      - Making it seem highly likely that a new AI trained using this new architecture/method and a similar amount of compute as the current AI would be substantially more powerful. (thus giving an estimate of time-to-AGI)
      - Making it possible to train a much smaller cheaper model than the current AI with the same capabilities.
      2. speed-ups and compute-efficiency for inference on current AI, and for the future cheaper versions
      
      3. ability to create and deploy more capable narrow tool-AIs which seem likely to substantially shift military power when deployed to existing military hardware (e.g. better drone piloting models)
      
      4. ability to create and deploy more capable narrow tool-AIs which seem likely to substantially increase economic productivity of the receiving factories.
      
      5. ability to rapidly innovate in non-ML technology, and thereby achieve military and economic benefits.
      6. ability to create and destroy self-replicating weapons which would kill most of humanity (e.g. bioweapons), and also to create targeted ones which would wipe out just the population of a specific country.
      
      If I were the government of a country in whom such a tech were being developed, I would really not other countries able to steal this tech. It would not seem like a worthwhile trade-off that the thieves would then have a more accurate estimate of how far from AGI my countries’ company was.
    - habryka Aug 29, 2024, 11:52 PM
      2 points
      2
      Parent
      but also it is weirdly hard to get out of bullets
      Just pressing enter twice seems to work well-enough for me, though I feel like I vaguely remember some bugged state where that didn’t work.
      - tlevin Aug 29, 2024, 11:58 PM
        3 points
        0
        Parent
        Yeah doing it again it works fine, but it was just creating a long list of empty bullet points (I also have this issue in GDocs sometimes)
        habryka Aug 30, 2024, 12:13 AM
        2 points
        0
        Parent
        Yeah, weird. I will see whether I can reproduce it somehow. It is quite annoying when it happens.
    - Bogdan Ionut Cirstea Aug 30, 2024, 2:05 PM
      1 point
      0
      Parent
      Company/country X has an AI agent that can ~~do 99%~~ [edit: let’s say “automate 90%”] of AI R&D tasks, call it Agent-GPT-7, and enough of a compute stock to have that train a significantly better Agent-GPT-8 in 4 months at full speed ahead, which can then train a basically superintelligent Agent-GPT-9 in another 4 months at full speed ahead. (Company/country X doesn’t know the exact numbers, but their 80% CI is something like 2-8 months for each step; company/country Y has less info, so their 80% CI is more like 1-16 months for each step.)
      Spicy take: it might be more realistic to substract 1 or even 2 from the numbers for the GPT generations, and also to consider that the intelligence explosion might be quite widely-distributed: https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=6EFv8PAvELkFopLHy
  - Nathan Helm-Burger Aug 30, 2024, 12:03 AM
    2 points
    −2
    Parent
    I strongly disagree, habryka, on the basis that I believe LLMs are already providing some uplift for highly harmful offense-dominant technology (e.g. bioweapons). I think this effect worsens the closer you get to full AGI. The inference cost to do this, even with a large model, is trivial. You just need to extract the recipe.
    This gives a weak state-actor (or wealthy non-state-actor) that has high willingness to undertake provocative actions the ability to gain great power from even temporary access to a small amount of inference from a powerful model. Once they have the weapon recipe, they no longer need the model.
    I’m also not sure about tlevin’s argument about ‘right to know’. I think the State has a responsibility to protect its citizens. So I certainly agree the State should be monitoring closely all the AI companies within its purview. On the other hand, making details of the progress of the AI publicly known may lead to increased international tensions or risk of theft or terrorism. I suspect it’s better that the State have inspectors and security personnel permanently posted in the AI labs, but that the exact status of the AI progress be classified.
    - habryka Aug 30, 2024, 12:14 AM
      6 points
      8
      Parent
      I think the costs of biorisks are vastly smaller than AGI-extinction risk, and so they don’t really factor into my calculations here. Having intermediate harms before AGI seems somewhat good, since it seems more likely to cause rallying around stopping AGI development, though I feel pretty confused about the secondary effects here (but am pretty confident the primary effects are relatively unimportant).
      - Nathan Helm-Burger Aug 30, 2024, 12:31 AM
        2 points
        −2
        Parent
        I think that doesn’t really make sense, since the lowest hanging fruit for disempowering humanity routes through self-replicating weapons. Bio weapons are the currently available technology which is in the category of self-replicating weapons. I think that would be the most likely attack vector for a rogue AGI seeking rapid coercive disempowerment.
        Plus, having bad actors (human or AGI) have access to a tech for which we currently have no practical defense, which could wipe out nearly all of humanity for under $100k… seems bad? Just a really unstable situation to be in?
        I do agree that it seems unlikely that some terrorist org is going to launch a civilization-ending bioweapon attack within the remaining 36 months or so until AGI (or maybe even ASI). But I do think that manipulating a terrorist org into doing this, and giving them the recipe and supplies to do so, would be a potentially tempting tactic for a hostile AGI.
        habryka Aug 30, 2024, 12:37 AM
        14 points
        15
        Parent
        I think if AI kills us all it would be because the AI wants to kill us all. It is (in my model of the world) very unlikely to happen because someone misuses AI systems.
        I agree that bioweapons might be part of that, but the difficult part of actually killing everyone via bioweapons requires extensive planning and deployment strategies, which humans won’t want to execute (since they don’t want to die), and so if bioweapons are involved in all of us dying it will very likely be the result of an AI seeing using them as an opportunity to take over, which I think is unlikely to happen because someone runs some leaked weights on some small amount of compute (or like, that would happen years after the same AIs would have done the same when run on the world’s largest computing clusters).
        In general, for any story of “dumb AI kills everyone” you need a story for why a smart AI hasn’t killed us first.
        Nathan Helm-Burger Aug 30, 2024, 12:54 AM
        2 points
        0
        Parent
        I think if AI kills us all it would be because the AI wants to kill us all. It is (in my model of the world) very unlikely to happen because someone misuses AI systems.
        I agree that it seems more likely to be a danger from AI systems misusing humans than humans misusing the AI systems.
        What I don’t agree with is jumping forward in time to thinking about when there is an AI so powerful it can kill us all at its whim. In my framework, that isn’t a useful time to be thinking about, it’s too late for us to be changing the outcome at that point.
        The key time to be focusing on is the time before the AI is sufficiently powerful to wipe out all of humanity, and there is nothing we can do to stop it.
        My expectation is that this period of time could be months or even several years, where there is an AI powerful enough and agentic enough to make a dangerous-but-stoppable attempt to take over the world. That’s a critical moment for potential success, since potentially the AI will be contained in such a way that the threat will be objectively demonstrable to key decision makers. That would make for a window of opportunity to make sweeping governance changes, and further delay take-over. Such a delay could be super valuable if it gives alignment research more critical time for researching the dangerously powerful AI.
        Also, the period of time between now and when the AI is that powerful is one where AI-as-a-tool makes it easier and easier for humans aided by AI to deploy civilization-destroying self-replicating weapons. Current AIs are already providing non-zero uplift (both lowering barriers to access, and raising peak potential harms). This is likely to continue to rapidly get worse over the next couple years. Delaying AGI doesn’t much help with biorisk from tool AI, so if you have a ‘delay AGI’ plan then you need to also consider the rapidly increasing risk from offense-dominant tech.
  - tlevin Aug 29, 2024, 11:39 PM
    1 point
    0
    Parent
    Also—I’m not sure I’m getting the thing where verifying that your competitor has a potentially pivotal model reduces racing?
    - habryka Aug 29, 2024, 11:51 PM
      2 points
      0
      Parent
      Same reason as knowing how many nukes your opponents has reduces racing. If you are conservative the uncertainty in how far ahead your opponent is causes escalating races, even if you would both rather not escalate (as long as your mean is well-calibrated).
      E.g. let’s assume you and your opponent are de-facto equally matched in the capabilities of your system, but both have substantial uncertainty, e.g. assign 30% probability to your opponent being substantially ahead of you. Then if you think those 30% of worlds are really bad, you probably will invest a bunch more into developing your systems (which of course your opponent will observe, increase their own investment, and then you repeat).
      However, if you can both verify how many nukes you have, you can reach a more stable equilibrium even under more conservative assumptions.
      - tlevin Aug 29, 2024, 11:56 PM
        3 points
        2
        Parent
        Gotcha. A few disanalogies though—the first two specifically relate to the model theft/shared access point, the latter is true even if you had verifiable API access:
        Me verifying how many nukes you have doesn’t mean I suddenly have that many nukes, unlike AI model capabilities, though due to compute differences it does not mean we suddenly have the same time distance to superintelligence.
        Me having more nukes only weakly enables me to develop more nukes faster, unlike AI that can automate a lot of AI R&D.
        This model seems to assume you have an imprecise but accurate estimate of how many nukes I have, but companies will probably be underestimating the proximity of each other to superintelligence, for the same reason that they’re underestimating their own proximity to superintelligence, until it’s way more salient/obvious.
        habryka Aug 30, 2024, 12:11 AM
        2 points
        0
        Parent
        Me verifying how many nukes you have doesn’t mean I suddenly have that many nukes, unlike AI model capabilities, though due to compute differences it does not mean we suddenly have the same time distance to superintelligence.
        It’s not super clear whether from a racing perspective having an equal number of nukes is bad. I think it’s genuinely messy (and depends quite sensitively on how much actors are scared of losing vs. happy about winning vs. scared of racing).
        I do also currently think that the compute-component will likely be a bigger deal than the algorithmic/weights dimension, making the situation more analogous to nukes, but I do think there is a lot of uncertainty on this dimension.
        Me having more nukes only weakly enables me to develop more nukes faster, unlike AI that can automate a lot of AI R&D.
        Yeah, totally agree that this is an argument against proliferation, and an important one. While you might not end up with additional racing dynamics, the fact that more global resources can now use the cutting edge AI system to do AI R&D is very scary.
        This model seems to assume you have an imprecise but accurate estimate of how many nukes I have, but companies will probably be underestimating the proximity of each other to superintelligence, for the same reason that they’re underestimating their own proximity to superintelligence, until it’s way more salient/obvious.
        In-general I think it’s very hard to predict whether people will overestimate or underestimate things. I agree that literally right now countries are probably underestimating it, but an overreaction in the future also wouldn’t surprise me very much (in the same way that COVID started with an underreaction, and then was followed by a massive overreaction).
        tlevin Aug 30, 2024, 12:27 AM
        3 points
        2
        Parent
        It’s not super clear whether from a racing perspective having an equal number of nukes is bad. I think it’s genuinely messy (and depends quite sensitively on how much actors are scared of losing vs. happy about winning vs. scared of racing).
        
        Importantly though, once you have several thousand nukes the strategic returns to more nukes drop pretty close to zero, regardless of how many your opponents have, while if you get the scary model’s weights and then don’t use them to push capabilities even more, your opponent maybe gets a huge strategic advantage over you. I think this is probably true, but the important thing is whether the actors think it might be true.
        In-general I think it’s very hard to predict whether people will overestimate or underestimate things. I agree that literally right now countries are probably underestimating it, but an overreaction in the future also wouldn’t surprise me very much (in the same way that COVID started with an underreaction, and then was followed by a massive overreaction).
        Yeah, good point.