Ben Smith comments on Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours

Ben Smith 5 Aug 2024 16:14 UTC
12 points
3
This frightening logic leaves several paths to survival. One is to make personal intent aligned AGI, and get it in the hands of a trustworthy-enough power structure. The second is to create a value-aligned AGI and release it as a sovereign, and hope we got its motivations exactly right on the first try. The third is to Shut It All Down, by arguing convincingly that the first two paths are unlikely to work—and to convince every human group capable of creating or preventing AGI work. None of these seem easy.^[3]
Is there an option which is “personal intent aligned AGI, but there are 100 of them”? Maybe most governments have one, may be some companies or rich individuals have one. Average Joes can rent a fine tuned AGI by the token, but there’s some limits on what values they can tune it to. There’s a balance of power between the AGIs similar to the balance of power of countries in 2024. Any one AGI could in theory destroy everything, except that the other 99 would oppose it, and so they pre-emptively prevent the creation of any AGI that would destroy everything.
AGIs have close-to-perfect information about each other and thus mostly avoid war because they know who would win, and the weaker AGI just defers in advance. If we get the balance right, no one AGI has more than 50% of the power, hopefully none have more than 20% of the power, such that no one can dominate.
There’s a spectrum from “power is distributed equally amongst all 8 billion people in the world” and “one person or entity controls everything” and this world might be somewhat more towards the unequal end than we have now, but still sitting somewhere along the spectrum.
I guess even if the default outcome is that the first AGI gets such a fast take-off it has an unrecoverable lead over the others, perhaps there are approaches to governance that distribute power to ensure that doesn’t happen.
- Seth Herd 5 Aug 2024 18:14 UTC
  10 points
  7
  Parent
  I wish. I think a 100-way multipolar scenario would be too unstable to last more than a few years.
  
  I think AGI that can self-improve produces really bad game theoretic equilibria, even when the total situation is far from zero-sum. I’m afraid that military technology favors offense over defense, and conflict won’t be limited to the infosphere. It seems to me that the nature of physics makes it way easier to blow stuff up than to make stuff that resists being blown up. Nukes produced mutually assured destruction because they’re so insanely destructive, and controlled only by nation-states with lots of soft targets. That situation isn’t likely to continue with 100 actors that can make new types of splodey weapons.
  
  I hope I’m wrong.
  
  This is probably worth a whole post, but here’s the basic logic. Some of this is crazy scifi stuff, but that’s the business we’re in here, I think, as AGI progresses to AI.
  
  If 100 people have AGI, some of those will pretty quickly become ASI—something so much smarter than humans that they can pretty quickly invent technologies that are difficult to anticipate. It seems like everyone will want to make their AGIs smarter as fast as possible, so they can do more good (by the holders definition) as well as serve better for defense or offense against potential hostile actors. Even if everyone agrees to not develop weapons, it seems like the locally sane option is to at least come up with plans to quickly develop and deploy them—and probably to develop some.
  
  A complete ban on developing weapons, and a strong agreement to split the enormous potential in some way might work—but I have no idea how we get from the current geopolitical state to there.
  
  This hypothetical AGI-developed weaponry doesn’t have to be crazy scifi to be really dangerous. Let’s think of just drones carrying conventional explosives. Or kinetic strikes from orbit that can do about as much or as little damage as is needed for a particular target.
  
  Now, why would anyone be shooting at each other in the first place? Hopefully they wouldn’t, but it would be very tempting to do, and kind of crazy to not build at least a few terrifying weapons to serve as deterrents.
  
  Those 100 entities with AGI might be mostly sane and prefer the vastly larger pie (in the short term) from cooperation to getting a bigger slice of a smaller pie. But that pie will grow again very rapidly if someone decides to fight for full control of the future before someone else does. Those 100 AGI holders will have different ideologies and visions of the future, even if they’re all basically sane and well-intentioned.
  
  Is this scenario stable? As long as you know who fired, you’ve got mutually assured destruction (sort of, but maybe not enough for total deterrence with more limited weapons and smaller conflicts). Having a way to hide or false-flag attacks disrupts that. Just set two rivals on each other if they atand in your way.
  
  The scarier, more complete solution to control the future is to eliminate this whole dangerous multipolar intelligence explosion at the source: Earth. As they say, the only way to be sure is to nuke it from orbit. It seems like one rational act is to at least research how to get a “seed” spacecraft out of the system, and look for technologies or techniques that could send the Sun nova. Presto, peace has been achieved under one remaining ASI. You’d want to know if your rivals could do such a thing, and now that you’ve researched it, maybe you have the capability, too. I’d assume everyone would.
  
  I guess one way to avoid people blowing up your stuff is to just run and hide. One scenario is a diaspora, in which near-C sailships head out in every direction, with the goal of scattering and staying low-profile while building new civilizations. Some existing people might make it in the form of uploads sent with the VNM probes or later by laser. But the culture and values could be preserved, and similar people could be recreated in the flesh or as uploads. The distances and delay might take the pressure off, and make building your own stuff way more attractive than fighting anyone else. And there could be retaliation agreements that actually work as deterrence if they’re designed by better minds than ours.
  
  This is as far as I’ve gotten with this scenario. Hopefully there’s some stable equilibrium, and AGI can help us navigate to it even if we’re in charge. Maybe it’s another MAD thing, or meticulous surveillance of every piece of technology anyone develops.
  
  But it seems way easier to achieve equilibrium on Earth the smaller the number of actors that have AGI capable of RSI.
  - mako yass 6 Aug 2024 0:40 UTC
    7 points
    2
    Parent
    I wish. I think a 100-way multipolar scenario would be too unstable to last more than a few years.
    Because they would immediately organize a merger! The most efficient negotiation outcome will line up with preference utilitarianism. War is a kind of waste, to which there’s always a better deal that could be made.
    I also think there are plenty of indications (second section) that the mutual transparency required to carry out a fair (or “chaa”) merger is going to be pretty trivial with even slightly more advanced information technology.
    - Seth Herd 6 Aug 2024 4:35 UTC
      7 points
      2
      Parent
      I got a good way through the setup to your first link. It took a while. If you’d be so kind, it would be nice to have a summary of why you think that rather dense set of posts is so relevant here? What I read did not match your link text (“The most efficient negotiation outcome will line up with preference utilitarianism.”) closely enough for this purpose. In some cases, I can get more of my preferences with you eliminated; no negotiation necessary :).
      
      The setup for that post was a single decision, with the failure to cooperate being pretty bad for both parties. The problem here is that that isn’t necessarily the case; the winner can almost take all, depending on their preferences. They can get their desired future in the long run, sacrificing only the short run, which is tiny if you’re really a longtermist. And the post doesn’t seem to address the iterated case; how do you know if someone’s going to renege, after some previous version of them has agreed to “fairly” split the future?
      
      So I don’t understand how the posts you link resolve that concern. Sure with sufficient intelligence you can get “chaa” (from your linked post: “fair”, proportional to power/ability to take what you want), but what if “chaa” is everyone but the first actor dead?
      
      If the solution is “sharing source code” as in earlier work I don’t think that’s at all applicable to network-based AGI; the three body problem of prediction applies in spades.
      - mako yass 6 Aug 2024 5:27 UTC
        1 point
        0
        Parent
        Hmm well I’d say it gets into that immediately, but it does so in a fairly abstract way. I’d recommend the whole lot though. It’s generally about what looks like a tendency in the math towards the unity of various bargaining systems.
        The setup for that post was a single decision
        A single decision can be something like “who to be, how to live, from now on”. There isn’t a strict distinction between single decision and all decisions from then on when acts of self-modification are possible, as self-modification changes all future decisions.
        On reflection, I’m not sure bargaining theory undermines the point you were making, I do think it’s possible that one party or another would dominate the merger, depending on what the technologies of superintelligent war turn out to be and how much the Us of the participants care about near term strife.
        But the feasibility of converging towards merger seems like a relevant aspect of all of this.
        Transparency aids/sufficies negotiation, but there wont be much of a negotiation if, say, having nukes turns out to be a very weak bargaining chip and the power distribution is just about who gets nanotech^[1] first or whatever, or if it turns out that human utility functions don’t care as much about loss of life in the near term as they do about owning the entirety of the future. I think the latter is very unlikely and the former is debatable.
        ^
        I don’t exactly believe in “nanotech”. I think materials science advances continuously and practical molecularly precise manufacturing will tend to look like various iterations of synthetic biology (you need a whole lot of little printer heads in order to make a large enough quantity of stuff to matter). There may be a threshold here, though, which we could call “DNA 2.0″ or something, a form of life that uses stronger things than amino acids.