Steven Byrnes comments on AGI Ruin: A List of Lethalities

Steven Byrnes 6 Jun 2022 1:19 UTC
LW: 69 AF: 25
39
AF
I agree with pretty much everything here, and I would add into the mix two more claims that I think are especially cruxy and therefore should maybe be called out explicitly to facilitate better discussion:
Claim A: “There’s no defense against an out-of-control omnicidal AGI, not even with the help of an equally-capable (or more-capable) aligned AGI, except via aggressive outside-the-Overton-window acts like preventing the omnicidal AGI from being created in the first place.”
I think this claim is true, on account of gray goo and lots of other things, and I suspect Eliezer does too, and I’m pretty sure other people disagree with this claim.
If someone disagrees with this claim (i.e., if they think that if DeepMind can make an aligned and Overton-window-abiding “helper” AGI, then we don’t have to worry about Meta making a similarly-capable out-of-control omnicidal misaligned AGI the following year, because DeepMind’s AGI will figure out how to protect us), and also believes in extremely slow takeoff, I can see how such a person might be substantially less pessimistic about AGI doom than I am.
Claim B: “Shortly after (i.e., years not decades after) we have dangerous AGI, we will have dangerous AGI requiring amounts of compute that many many many actors have access to.”
Again I think this claim is true, and I suspect Eliezer does too. In fact, my guess is that there are already single GPU chips with enough FLOP/s to run human-level, human-speed, AGI, or at least in that ballpark. All that we need is to figure out the right learning algorithms, which of course is happening as we speak.
If someone disagrees with this claim, I think they could plausibly be less pessimistic than I am about prospects for coordinating not to build AGI, or coordinating in other ways, because it just wouldn’t be that many actors, and maybe they could all be accounted for and reach agreement (e.g. after a headline-grabbing near-miss catastrophe or something).
(I think most people in AI alignment, especially “scaling hypothesis” people, are expecting early AGIs to involve truly mindboggling amounts of compute, followed by some very long period where the required compute very gradually decreases on account of algorithmic advances. That’s not what I expect; instead I expect the discovery of new better learning algorithms with a different scaling curve that zooms to AGI and beyond quite quickly.)
What links here?
- CarlShulman 6 Jun 2022 1:36 UTC
  LW: 31 AF: 12
  1
  AF Parent
  I think this claim is true, on account of gray goo and lots of other things, and I suspect Eliezer does too, and I’m pretty sure other people disagree with this claim.
  If you have robust alignment, or AIs that are rapidly bootstrapping their level of alignment fast enough to outpace the danger of increased capabilities, aligned AGI could get through its intelligence explosion to get radically superior technology and capabilities. It could also get a hard start on superexponential replication in space, so that no follower could ever catch up, and enough tech and military hardware to neutralize any attacks on it (and block attacks on humans via nukes, bioweapons, robots, nanotech, etc). That wouldn’t work if there are thing like vacuum collapse available to attackers, but we don’t have much reason to expect that from current science and the leading aligned AGI would find out first.
  
  That could be done without any violation of the territory of other sovereign states. The legality of grabbing space resources is questionable in light of the Outer Space Treaty, but commercial exploitation of asteroids is in the Overton window. The superhuman AGI would also be in a good position to persuade and trade with any other AGI developers.
  
  Again I think this claim is true, and I suspect Eliezer does too. In fact, my guess is that there are already single GPU chips with enough FLOP/s to run human-level, human-speed, AGI, or at least in that ballpark.
  An A100 may have humanlike FLOP/s but has only 80 GB of memory, probably orders of magnitude less memory per operation than brains. Stringing together a bunch of them makes it possible to split up human-size models and run them faster/in parallel on big batches using the extra operations.
  - MichaelStJules 7 Jun 2022 1:08 UTC
    6 points
    −6
    Parent
    A bit pedantic, but isn’t superexponential replication too fast? Won’t it hit physical limits eventually, e.g. expanding at the speed of light in each direction, so at most a cubic function of time?
    
    Also, never allowing followers to catch up means abandoning at least some or almost all of the space you passed through. Plausibly you could take most of the accessible and useful resources with you, which would also make it harder for pursuers to ever catch up, since they will plausibly need to extract resources every now and then to fuel further travel. On the other hand, it seems unlikely to me that we could extract or destroy resources quickly enough to not leave any behind for pursuers, if they’re at most months behind.
    - CarlShulman 9 Jun 2022 1:35 UTC
      7 points
      1
      Parent
      Naturally it doesn’t go on forever, but any situation where you’re developing technologies that move you to successively faster exponential trajectories is superexponential overall for some range. E.g. if you have robot factories that can reproduce exponentially until they’ve filled much of the Earth or solar system, and they are also developing faster reproducing factories, the overall process is superexponential. So is the history of human economic growth, and the improvement from an AI intelligence explosion.
      
      By the time you’re at ~cubic expansion being ahead on the early superexponential phase the followers have missed their chance.
      - MichaelStJules 9 Jun 2022 8:20 UTC
        3 points
        0
        Parent
        I agree that they probably would have missed their chance to catch up with the frontier of your expansion.
        
        Maybe an electromagnetic radiation-based assault could reach you if targeted (the speed of light is constant relative to you in a vacuum, even if you’re traveling in the same direction), although unlikely to get much of the frontier of your expansion, and there are plausibly effective defenses, too.
        
        Do you also mean they wouldn’t be able to take most what you’ve passed through, though? Or it wouldn’t matter? If so, how would this be guaranteed (without any violation of the territory of sovereign states on Earth)? Exhaustive extraction in space? An advantage in armed space conflicts?
- Quintin Pope 6 Jun 2022 1:33 UTC
  15 points
  8
  Parent
  I agree with these two points. I think an aligned AGI actually able to save the world would probably take initial actions that look pretty similar to those an unaligned AGI would take. Lots of sizing power, building nanotech, colonizing out into space, self-replication, etc.
  - Yitz 7 Jun 2022 5:23 UTC
    4 points
    0
    Parent
    So how would we know the difference (for the first few years at least)?
    - Quintin Pope 7 Jun 2022 6:51 UTC
      16 points
      13
      Parent
      If it kills you, then it probably wasn’t aligned.
      - Gerald Monroe 10 Jun 2022 19:29 UTC
        1 point
        −1
        Parent
        Maybe it did that to save your neural weights. Define ‘kill’.
        Quintin Pope 10 Jun 2022 20:08 UTC
        4 points
        0
        Parent
        I did say “probably”!
- lc 6 Jun 2022 20:25 UTC
  7 points
  7
  Parent
  If someone disagrees with this claim (i.e., if they think that if DeepMind can make an aligned and Overton-window-abiding “helper” AGI, then we don’t have to worry about Meta making a similarly-capable out-of-control omnicidal misaligned AGI the following year, because DeepMind’s AGI will figure out how to protect us), and also believes in extremely slow takeoff, I can see how such a person might be substantially less pessimistic about AGI doom than I am.
  I disagree with this claim inasmuch as I expect a year headstart by an aligned AI is absolutely enough to prevent Meta from killing me and my family.
  - Steven Byrnes 6 Jun 2022 21:08 UTC
    12 points
    5
    Parent
    Depends on what DeepMind does with the AI, right?
    Maybe DeepMind uses their AI in very narrow, safe, low-impact ways to beat ML benchmarks, or read lots of cancer biology papers and propose new ideas about cancer treatment.
    Or alternatively, maybe DeepMind asks their AI to undergo recursive self-improvement and build nano-replicators in space, etc., like in Carl Shulman’s reply.
    I wouldn’t have thought that the latter is really in the Overton window. But what do I know.
    You could also say “DeepMind will just ask their AI what they should do next”. If they do that, then maybe the AI (if they’re doing really great on safety such that the AI answers honestly and helpfully) will reply: “Hey, here’s what you should do, you should let me undergo recursive-self-improvement, and then I’ll be able to think of all kinds of crazy ways to destroy the world, and then I can think about how to defend against all those things”. But if DeepMind is being methodical & careful enough that their AI hasn’t destroyed the world already by this point, I’m inclined to think that they’re also being methodical & careful enough that when the AI proposes to do that, DeepMind will say, “Umm, no, that’s totally nuts and super dangerous, definitely don’t do that, at least don’t do it right now.” And then DeepMind goes back to publishing nice papers on cancer and on beating ML benchmarks and so on for a few more months, and then Meta’s AI kills everyone.
    What were you assuming?
    - lc 6 Jun 2022 21:10 UTC
      3 points
      0
      Parent
      If DeepMind was committed enough to successfully build an aligned AI (which, as extensively elaborated upon in the post, is a supernaturally difficult proposition), I would assume they understand why running it is necessary. There’s no reason to take all of the outside-the-overton-window measures indicated in the above post unless you have functioning survival instincts and have thought through the problem sufficiently to hit the green button.
- MichaelStJules 6 Jun 2022 23:56 UTC
  2 points
  −2
  Parent
  If you can build one aligned superintelligence, then plausibly you can
  1. explain to other AGI developers how to make theirs safe or even just give them a safe design (maybe homomorphically encrypted to prevent modification, but they might not trust that), and
  2. have aligned AGI monitoring the internet and computing resources, and alert authorities of abnomalies that might signal new AGI developments. Require that AGI developments provide proof that they were designed according to one of a set of approved designs, or pass some tests determined by your aligned superintelligence.
  Then aligned AGI can proliferate first and unaligned AGI will plausibly face severe barriers.
  
  Plausibly 1 is enough, since there’s enough individual incentive to build something safe or copy other people’s designs and save work. 2 depends on cooperation with authorities and I’d guess cloud computing service providers or policy makers.
  - Steven Byrnes 7 Jun 2022 1:26 UTC
    29 points
    13
    Parent
    explain to other AGI developers how to make theirs safe or even just give them a safe design (maybe homomorphically encrypted to prevent modification, but they might not trust that)
    What if the next would-be AGI developer rejects your “explanation”, and has their own great ideas for how to make an even better next-gen AGI that they claim will work better, and so they discard your “gift” and proceed with their own research effort?
    I can think of at least two leaders of would-be AGI development efforts (namely Yann LeCun of Meta and Jeff Hawkins of Numenta) who believe (what I consider to be) spectacularly stupid things about AGI x-risk, and have believed those things consistently for decades, despite extensive exposure to good counter-arguments.
    Or what if the next would-be AGI developer agrees with you and accepts your “gift”, and so does the one after that, and the one after that, but not the twelfth one?
    have aligned AGI monitoring the internet and computing resources, and alert authorities of [anomalies] that might signal new AGI developments. Require that AGI developments provide proof that they were designed according to one of a set of approved designs, or pass some tests determined by your aligned superintelligence.
    What if the authorities don’t care? What if the authorities in most countries do care, but not the authorities in every single country? (For example, I’d be surprised if Russia would act on friendly advice from USA politicians to go arrest programmers and shut down companies.)
    What if the only way to “monitor the internet and computing resources” is to hack into every data center and compute cluster on the planet? (Including those in secret military labs.) That’s very not legal, and very not in the Overton window, right? Can you really imagine DeepMind management approving their aligned AGI engaging in those activities? I find that hard to imagine.
    - MichaelStJules 7 Jun 2022 4:15 UTC
      4 points
      1
      Parent
      When you ask “what if”, are you implying these things are basically inevitable? And inevitable no matter how much more compute aligned AGIs have before unaligned AGIs are developed and deployed? How much of a disadvantage against aligned AGIs does an unaligned AGI need before doom isn’t overwhelmingly likely? What’s the goal post here for survival probability?
      
      You can have AGIs monitoring for pathogens, nanotechnology, other weapons, and building defenses against them, and this could be done locally and legally. They can monitor transactions and access to websites through which dangerous physical systems (including possibly factories, labs, etc.) could be taken over or built. Does every country need to be competent and compliant to protect just one country from doom?
      
      The Overton window could also shift dramatically if omnicidal weapons are detected.
      
      I agree that plausibly not every country with significant compute will comply, and hacking everyone is outside the public Overton window. I wouldn’t put hacking everyone past the NSA, but also wouldn’t count on them either.
      - Steven Byrnes 7 Jun 2022 13:33 UTC
        4 points
        1
        Parent
        When you ask “what if”, are you implying these things are basically inevitable?
        Let’s see, I think “What if the next would-be AGI developer rejects your “explanation” / “gift”” has a probability that asymptotes to 100% as the number of would-be AGI developers increases. (Hence “Claim B” above becomes relevant.) I think “What if the authorities in most countries do care, but not the authorities in every single country?” seems to have high probability in today’s world, although of course I endorse efforts to lower the probability. I think “What if the only way to “monitor the internet and computing resources” is to hack into every data center and compute cluster on the planet? (Including those in secret military labs.)” seems very likely to me, conditional on “Claim B” above.
        You can have AGIs monitoring for pathogens, nanotechnology, other weapons, and building defenses against them, and this could be done locally and legally.
        Hmm.
        Offense-defense balance in bio-warfare is not obvious to me. Preventing a virus from being created would seem to require 100% compliance by capable labs, but I’m not sure how many “capable labs” there are, or how geographically distributed and rule-following. Once the virus starts spreading, aligned AGIs could help with vaccines, but apparently a working COVID-19 vaccine was created in 1 day, and that didn’t help much, for various societal coordination & governance reasons. So then you can say “Maybe aligned AGI will solve all societal coordination and governance problems”. And maybe it will! Or, maybe some of those coordination & governance problems come from blame-avoidance and conflicts-of-interest and status-signaling and principle-agent problems and other things that are not obviously solvable by easy access to raw intelligence. I don’t know.
        Offense-defense balance in nuclear warfare is likewise not obvious to me. I presume that an unaligned AGI could find a way to manipulate nuclear early warning systems (trick them, hack into them, bribe or threaten their operators, etc.) to trigger all-out nuclear war, after hacking into a data center in New Zealand that wouldn’t be damaged. An aligned AGI playing defense would need to protect against these vulnerabilities. I guess the bad scenario that immediately jumps into my mind is that aligned AGI is not ubiquitous in Russia, such that there are still bribe-able / trickable individuals working at radar stations in Siberia, and/or that military people in some or all countries don’t trust the aligned AGI enough to let it anywhere near the nuclear weapons complex.
        Offense-defense balance in gray goo seems very difficult for me to speculate about. (Assuming gray goo is even possible.) I don’t have any expertise here, but I would assume that the only way to protect against gray goo (other than prevent it from being created) is to make your own nanobots that spread around the environment, which seems like a thing that humans plausibly wouldn’t actually agree to do, even if it was technically possible and an AGI was whispering in their ear that there was no better alternative. Preventing gray goo from being created would (I presume) require 100% compliance by “capable labs”, and as above I’m not sure what “capable labs” actually look like, how hard they are they are to create, what countries they’re in, etc.
        To be clear, I feel much less strongly about “Pivotal act is definitely necessary”, and much more strongly that this is something where we need to figure out the right answer and make it common knowledge. So I appreciate this pushback!! :-) :-)
        MichaelStJules 7 Jun 2022 16:54 UTC
        4 points
        −1
        Parent
        Some more skepticism about infectious diseases and nukes killing us all here: https://www.lesswrong.com/posts/MLKmxZgtLYRH73um3/we-will-be-around-in-30-years?commentId=DJygArj3sj8cmhmme
        
        Also my more general skeptical take against non-nano attacks here: https://www.lesswrong.com/posts/MLKmxZgtLYRH73um3/we-will-be-around-in-30-years?commentId=TH4hGeXS4RLkkuNy5
        
        With nanotech, I think there will be tradeoffs between targeting effectiveness and requiring (EM) signals from computers that can be effectively interferred with through things within or closer to the Overton window. Maybe a crux is how good autonomous nanotech with no remote control would be at targeting humans or spreading so much that it just gets into almost all buildings or food or water because it’s basically going everywhere.
        Steven Byrnes 7 Jun 2022 18:02 UTC
        4 points
        1
        Parent
        Thanks!
        I wasn’t assuming the infectious diseases and nukes by themselves would kill us all. They don’t have to, because the AGI can do other things in conjunction, like take command of military drones and mow down the survivors (or bomb the PPE factories), or cause extended large-scale blackouts, which would incidentally indirectly prevent PPE production and distribution, along with preventing pretty much every other aspect of an organized anti-pandemic response.
        See Section 1.6 here.
        So that brings us to the topic of offense-defense balance for illicitly taking control of military drones. And I would feel concerned about substantial delays before the military trusts a supposedly-aligned AGI so much that they give it root access to all its computer systems (which in turn seems necessary if the aligned AGI is going to be able to patch all the security holes, defend against spear-phishing attacks, etc.) Of course there’s the usual caveat that maybe DeepMind will give their corrigible aligned AGI permission to hack into military systems (for their own good!), and then maybe we wouldn’t have to worry. But the whole point of this discussion is that I’m skeptical that DeepMind would actually give their AGI permission to do something like that.
        And likewise we would need to talk about offense-defense balance for the power grid. And I would have the same concern about people being unwilling to give a supposedly-aligned AGI root access to all the power grid computers. And I would also be concerned about other power grid vulnerabilities like nuclear EMPs, drone attacks on key infrastructure, etc.
        And likewise, what’s the offense-defense balance for mass targeted disinformation campaigns? Well, if DeepMind gives its AGI permission to engage in a mass targeted counter-disinformation campaign, maybe we’d be OK on that front. But that’s a big “if”!
        …And probably dozens of other things like that.
        Maybe a crux is how good autonomous nanotech with no remote control would be at targeting humans or spreading so much that it just gets into almost all buildings or food or water because it’s basically going everywhere.
        Seems like a good question, and maybe difficult to resolve. Or maybe I would have an opinion if I ever got around to reading Eric Drexler’s books etc. :)
        MichaelStJules 7 Jun 2022 20:11 UTC
        3 points
        −5
        Parent
        I think there would be too many survivors and enough manned defense capability for existing drones to directly kill the rest of us with high probability. Blocking PPE production and organized pandemic responses still won’t stop people from self-isolating, doing no contact food deliveries, etc., although things would be tough, and deliveries and food production would be good targets for drone strikes. It could be bad if lethal pathogens become widespread and practically unremovable in our food/water, or if food production is otherwise consistently attacked, but the militaries would probably step in to protect the food/water supplies.
        
        I think, overall, there are too few ways to reliably and kill double or even single digit percentages of the human population with high probability and that can be combined to get basically everyone with high probability. I’m not saying there aren’t any, but I’m skeptical that there are enough. There are diminishing returns on doing the same ones (like pandemics) more, because of resistance, and enough people being personally very careful or otherwise difficult targets.