tlevin comments on tlevin’s Shortform

tlevin Aug 29, 2024, 11:35 PM
3 points
0
If probability of misalignment is low, probability of human+AI coups (including e.g. countries invading each other) is high, and/or there aren’t huge offense-dominant advantages to being somewhat ahead, you probably want more AGI projects, not fewer. And if you need a ton of compute to go from an AI that can do 99% of AI R&D tasks to an AI that can cause global catastrophe, then model theft is less of a factor. But the thing I’m worried about re: model theft is a scenario like this, which doesn’t seem that crazy:
- Company/country X has an AI agent that can ~~do 99%~~ [edit: let’s say “automate 90%”] of AI R&D tasks, call it Agent-GPT-7, and enough of a compute stock to have that train a significantly better Agent-GPT-8 in 4 months at full speed ahead, which can then train a basically superintelligent Agent-GPT-9 in another 4 months at full speed ahead. (Company/country X doesn’t know the exact numbers, but their 80% CI is something like 2-8 months for each step; company/country Y has less info, so their 80% CI is more like 1-16 months for each step.)
- The weights for Agent-GPT-7 are available (legally or illegally) to company/country Y, which is known to company/country X.
- Y has, say, a fifth of the compute. So each of those steps will take 20 months. Symmetrically, company/country Y thinks it’ll take 10-40 months and company/country X thinks it’s 5-80.
- Once superintelligence is in sight like this, both company/country X and Y become very scared of the other getting it first—in the country case, they are worried it will undermine nuclear deterrence, upend their political system, basically lead to getting taken over by the other. The relevant decisionmakers think this outcome is better than extinction, but maybe not by that much, whereas getting superintelligence before the other side is way better. In the company case, it’s a lot less intense, but they still would much rather get superintelligence than their arch-rival CEO.
- So, X thinks they have anywhere from 5-80 months before Y has superintelligence, and Y thinks they have 1-16 months. So X and Y both think it’s easily possible, well within their 80% CI, that Y beats X.
- X and Y have no reliable means of verifying a commitment like “we will spend half our compute on safety testing and alignment research.”
- If these weights were not available, Y would have a similarly good system in 18 months, 80% CI 12-24.
So, had the weights not been available to Y, X would be confident that it had 12 + 5 months to manage a capabilities explosion that would have happened in 8 months at full speed; it can spend >half of its compute on alignment/safety/etc, and it has 17 rather than 5 months of serial time to negotiate with Y, possibly develop some verification methods and credible mechanisms for benefit/power-sharing, etc. If various transparency reforms have been implemented, such that the world is notified in ~real-time that this is happening, there would be enormous pressure to do so; I hope and think it will seem super illegitimate to pursue this kind of power without these kinds of commitments. I am much more worried about X not doing this and instead just trying to grab enormous amounts of power if they’re doing it all in secret.
[Also: I just accidentally went back a page by command-open bracket in an attempt to get my text out of bullet format and briefly thought I lost this comment; thank you in your LW dev capacity for autosave draft text, but also it is weirdly hard to get out of bullets]
- Nathan Helm-Burger Aug 30, 2024, 1:12 AM
  3 points
  0
  Parent
  I expect that having a nearly-AGI-level AI, something capable of mostly automating further ML research, means the ability to rapidly find algorithmic improvements that result in:
  
  1. drastic reductions in training cost for an equivalently strong AI.
  - Making it seem highly likely that a new AI trained using this new architecture/method and a similar amount of compute as the current AI would be substantially more powerful. (thus giving an estimate of time-to-AGI)
  - Making it possible to train a much smaller cheaper model than the current AI with the same capabilities.
  2. speed-ups and compute-efficiency for inference on current AI, and for the future cheaper versions
  
  3. ability to create and deploy more capable narrow tool-AIs which seem likely to substantially shift military power when deployed to existing military hardware (e.g. better drone piloting models)
  
  4. ability to create and deploy more capable narrow tool-AIs which seem likely to substantially increase economic productivity of the receiving factories.
  
  5. ability to rapidly innovate in non-ML technology, and thereby achieve military and economic benefits.
  6. ability to create and destroy self-replicating weapons which would kill most of humanity (e.g. bioweapons), and also to create targeted ones which would wipe out just the population of a specific country.
  
  If I were the government of a country in whom such a tech were being developed, I would really not other countries able to steal this tech. It would not seem like a worthwhile trade-off that the thieves would then have a more accurate estimate of how far from AGI my countries’ company was.
- habryka Aug 29, 2024, 11:52 PM
  2 points
  2
  Parent
  but also it is weirdly hard to get out of bullets
  Just pressing enter twice seems to work well-enough for me, though I feel like I vaguely remember some bugged state where that didn’t work.
  - tlevin Aug 29, 2024, 11:58 PM
    3 points
    0
    Parent
    Yeah doing it again it works fine, but it was just creating a long list of empty bullet points (I also have this issue in GDocs sometimes)
    - habryka Aug 30, 2024, 12:13 AM
      2 points
      0
      Parent
      Yeah, weird. I will see whether I can reproduce it somehow. It is quite annoying when it happens.
- Bogdan Ionut Cirstea Aug 30, 2024, 2:05 PM
  1 point
  0
  Parent
  Company/country X has an AI agent that can ~~do 99%~~ [edit: let’s say “automate 90%”] of AI R&D tasks, call it Agent-GPT-7, and enough of a compute stock to have that train a significantly better Agent-GPT-8 in 4 months at full speed ahead, which can then train a basically superintelligent Agent-GPT-9 in another 4 months at full speed ahead. (Company/country X doesn’t know the exact numbers, but their 80% CI is something like 2-8 months for each step; company/country Y has less info, so their 80% CI is more like 1-16 months for each step.)
  Spicy take: it might be more realistic to substract 1 or even 2 from the numbers for the GPT generations, and also to consider that the intelligence explosion might be quite widely-distributed: https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=6EFv8PAvELkFopLHy