Eliezer Yudkowsky comments on Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky 16 Nov 2021 1:08 UTC
4 points
If you can deploy nanomachines that melt all the GPU farms and prevent any new systems with more than 1 networked GPU from being constructed, that counts. That really actually suspends AGI development indefinitely pending an unlock, and not just for a brief spasmodic costly delay.
- Wei Dai 27 Nov 2021 21:54 UTC
  8 points
  Parent
  Can you please clarify:
  1. Are you expecting that the team behind the “melt all GPU farms” pivotal act to be backed by a major government or coalition of governments?
  2. If not, I expect that the team and its AGI will be arrested/confiscated by the nearest authority as soon as the pivotal act occurs, and forced by them to apply the AGI to other goals. Do you see things happening differently, or expect things to come out well despite this?
  - Eliezer Yudkowsky 28 Nov 2021 5:33 UTC
    6 points
    Parent
    “Melt all GPUs” is indeed an unrealistic pivotal act—which is why I talk about it, since like any pivotal act it is outside the Overton Window, and then if any children get indignant about the prospect of doing something other than letting the world end miserably, I get to explain the child-reassuring reasons why you would never do the particular thing of “melt all GPUs” in real life. In this case, the reassuring reason is that deploying open-air nanomachines to operate over Earth is a huge alignment problem, that is, relatively huger than the least difficult pivotal act I can currently see.
    That said, if unreasonably-hypothetically you can give your AI enough of a utility function and have it deploy enough intelligence to create nanomachines that safely move through the open-ended environment of Earth’s surface, avoiding bacteria and not damaging any humans or vital infrastructure, in order to surveil all of Earth and find the GPU farms and then melt them all, it’s probably not very much harder to tell those nanomachines to melt other things, or demonstrate the credibly threatening ability to do so.
    That said, I indeed don’t see how we sociologically get into this position in a realistic way, in anything like the current world, even assuming away the alignment problem. Unless Demis Hassabis suddenly executes an emergency pact with the Singaporean government, or something else I have trouble visualizing? I don’t see any of the current owners or local governments of the big AI labs knowingly going along with any pivotal act executed deliberately (though I expect them to think it’s just fine to keep cranking up the dial on an AI until it destroys the world, so long as it looks like it’s not being done on purpose).
    It is indeed the case that, conditional on the alignment problem being solvable, there’s a further sociological problem—which looks a lot less impossible, but which I do not actually know how to solve—wherein you then have to do something pivotal, and there’s no grownups in government in charge who would understand why that was something necessary to do. But it’s definitely a lot easier to imagine Demis forming a siloed team or executing an emergency pact with Singapore, than it is to see how you would safely align the AI that does it. And yes, the difficulty of any pivotal act to stabilize the Earth includes the difficulty of what you had to do, before or after you had sufficiently powerful AGI, in order to execute that act and then prevent things from falling over immediately afterwards.
    - Wei Dai 28 Nov 2021 20:59 UTC
      6 points
      Parent
      
      the least difficult pivotal act I can currently see.
      
      Do you have a plan to communicate the content of this to people whom it would be beneficial to communicate to? E.g., write about it in some deniable way, or should such people just ask you about it privately? Or more generally, how do you think that discussions / intellectual progress on this topic should go?
      
      Do you think the least difficult pivotal act you currently see has sociopolitical problems that are similar to “melt all GPUs”?
      
      That said, I indeed don’t see how we sociologically get into this position in a realistic way, in anything like the current world, even assuming away the alignment problem.
      
      Thanks for the clarification. I suggest mentioning this more often (like in the Arbital page), as I previously didn’t think that your version of “pivotal act” had a significant sociopolitical component. If this kind of pivotal act is indeed how the world gets saved (conditional on the world being saved), one of my concerns is that “a miracle occurs” and the alignment problem gets solved, but the sociopolitical problem doesn’t because nobody was working on it (even if it’s easier in some sense).
      
      But it’s definitely a lot easier to imagine Demis forming a siloed team or executing an emergency pact with Singapore
      
      (Not a high priority to discuss this here and now, but) I’m skeptical that backing by a small government like Singapore is sufficient, since any number of major governments would be very tempted to grab the AGI(+team) from the small government, and the small government will be under tremendous legal and diplomatic stress from having nonconsensually destroyed a lot of very valuable other people’s property. Having a partially aligned/alignable AGI in the hands of a small, geopolitically weak government seems like a pretty precarious state.
      - Eliezer Yudkowsky 29 Nov 2021 6:06 UTC
        6 points
        Parent
        Singapore probably looks a lot less attractive to threaten if it’s allied with another world power that can find and melt arbitrary objects.
- Vaniver 16 Nov 2021 2:02 UTC
  4 points
  Parent
  I’m still unsure how true I think this is.
  Clearly a full Butlerian jihad (where all of the computers are destroyed) suspends AGI development indefinitely, and destroying no computers doesn’t slow it down at all. There’s a curve then where the more computers you destroy, the more you both 1) slow down AGI development and 2) disrupt the economy (since people were using those to keep their supply chains going, organize the economy, do lots of useful work, play video games, etc.).
  But even if you melt all the GPUs, I think you have two obstacles:
  1. CPUs alone can do lots of the same stuff. There’s some paper I was thinking of from ~5 years ago where they managed to get a CPU farm competitive with the GPUs of the time, and it might have been this paper (whose authors are all from Intel, who presumably have a significant bias) or it might have been the Hogwild-descended stuff (like this); hopefully someone knows something more up to date.
  2. The chip design ecosystem gets to react to your ubiquitous nanobots and reverse-engineer what features they’re looking for to distinguish between whitelisted CPUs and blacklisted GPUs; they may be able to design a ML accelerator that fools the nanomachines. (Something that’s robust to countermoves might have to eliminate many more current chips.)
  - Eliezer Yudkowsky 16 Nov 2021 3:38 UTC
    6 points
    Parent
    I agree you might need to make additional moves to keep the table flipped, but in a scenario like this you would actually have the capability to make those moves.
    - Logan Zoellner 16 Nov 2021 14:59 UTC
      7 points
      Parent
      Is the plan just to destroy all computers with say >1e15 flops of computing power? How does the nanobot swarm know what a “computer” is? What do you do about something like GPT-neo or SETI-at-home where the compute is distributed?
      I’m still confused as to why you think task: “build an AI that destroys anything with >1e15 flops of computing power—except humans, of course” would be dramatically easier than the alignment problem.
      Setting back civilization a generation (via catastrophe) seems relatively straightforward. Building a social consensus/religion that destroys anything “in the image of a mind” at least seems possible. Fine-tuning a nanobot swarm to destroy some but not all computers just sounds really hard to me.