Logan Zoellner comments on Ngo and Yudkowsky on alignment difficulty

Logan Zoellner 15 Nov 2021 22:16 UTC
11 points
If I really thought AI was going to murder us all in the next 6 months to 2 years, I would definitely consider those 10 years “pivotal”, since it would give us 5x-20x the time to solve the alignment problem. I might even go full Butlerian Jihad and just ban semiconductor fabs altogether.
Actually, I think that right question, is: is there anything you would consider pivotal other that just solving the alignment problem? If no, the whole argument seems to be “If we can’t find a safe way to solve the alignment problem, we should consider dangerous ones.”
- Rob Bensinger 15 Nov 2021 22:49 UTC
  17 points
  Parent
  [Update: As of ~~today~~ Nov. 16 (after checking with Eliezer), I’ve edited the Arbital page to define “pivotal act” the way it’s usually used: to refer to a good gameboard-flipping action, not e.g. ‘AI destroys humanity’. The quote below uses the old definition, where ‘pivotal’ meant anything world-destroying or world-saving.]
  Eliezer’s using the word “pivotal” here to mean something relatively specific, described on Arbital:
  The term ‘pivotal’ in the context of value alignment theory is a guarded term to refer to events, particularly the development of sufficiently advanced AIs, that will make a large difference a billion years later. A ‘pivotal’ event upsets the current gameboard—decisively settles a win or loss, or drastically changes the probability of win or loss, or changes the future conditions under which a win or loss is determined.
  [...]
  Examples of pivotal and non-pivotal events
  Pivotal events:
  non-value-aligned AI is built, takes over universe
  human intelligence enhancement powerful enough that the best enhanced humans are qualitatively and significantly smarter than the smartest non-enhanced humans
  a limited Task AGI that can:
  upload humans and run them at speeds more comparable to those of an AI
  prevent the origin of all hostile superintelligences (in the nice case, only temporarily and via strategies that cause only acceptable amounts of collateral damage)
  design or deploy nanotechnology such that there exists a direct route to the operators being able to do one of the other items on this list (human intelligence enhancement, prevent emergence of hostile SIs, etc.)
  a complete and detailed synaptic-vesicle-level scan of a human brain results in cracking the cortical and cerebellar algorithms, which rapidly leads to non-value-aligned neuromorphic AI
  Non-pivotal events:
  curing cancer (good for you, but it didn’t resolve the value alignment problem)
  proving the Riemann Hypothesis (ditto)
  an extremely expensive way to augment human intelligence by the equivalent of 5 IQ points that doesn’t work reliably on people who are already very smart
  making a billion dollars on the stock market
  robotic cars devalue the human capital of professional drivers, and mismanagement of aggregate demand by central banks plus burdensome labor market regulations is an obstacle to their re-employment
  Borderline cases:
  unified world government with powerful monitoring regime for ‘dangerous’ technologies
  widely used gene therapy that brought anyone up to a minimum equivalent IQ of 120
  Centrality to limited AI proposals
  We can view the general problem of Limited AI as having the central question: What is a pivotal positive accomplishment, such that an AI which does that thing and not some other things is therefore a whole lot safer to build? This is not a trivial question because it turns out that most interesting things require general cognitive capabilities, and most interesting goals can require arbitrarily complicated value identification problems to pursue safely.
  It’s trivial to create an “AI” which is absolutely safe and can’t be used for any pivotal achievements. E.g. Google Maps, or a rock with “2 + 2 = 4″ painted on it.
  [...]
  Centrality to concept of ‘advanced agent’
  We can view the notion of an advanced agent as “agent with enough cognitive capacity to cause a pivotal event, positive or negative”; the advanced agent properties are either those properties that might lead up to participation in a pivotal event, or properties that might play a critical role in determining the AI’s trajectory and hence how the pivotal event turns out.
  In conversations I’ve seen that use the word “pivotal”, it’s usually asking about pivotal acts we can do that end the acute x-risk period (things that make it the case that random people in the world can’t suddenly kill everyone with AGI or bioweapons or what-have-you). I.e., it’s specifically focused on good pivotal acts.
  - Rob Bensinger 15 Nov 2021 23:10 UTC
    5 points
    Parent
    IMO it’s confusing that Eliezer uses the word “pivotal” on Arbital to also refer to ways AI could destroy the world. If we’re talking about stuff like “what’s the easiest pivotal act?” or “how hard do pivotal acts tend to be?”, I’ll give wildly different answers if I’m including ‘ways to destroy the world’ and not just ‘ways to save the world’—destroying the world seems drastically easier to me. And I don’t know of an unambiguous short synonym for ‘good pivotal act’.
    (Eliezer proposes ‘pivotal achievement’, but empirically I don’t see people using this much, and it still has the same problem that it re-uses the word ‘pivotal’ for both categories of event, thus making them feel very similar.)
    Usually I care about either ‘ways of saving the world’ or ‘ways of destroying the world’—I rarely find myself needing a word for the superset. E.g., I’ll find myself searching for a short term to express things like ‘the first AGI company needs to look for a way-to-save-the-world’ or ‘I wish EAs would spend more time thinking about ways-to-use-AGI-to-save-the-world’. But if I say ‘pivotal’, this will technically include x-catastrophes, which is not what I have in mind.
    (On the other hand, the concept of ‘the kind of AI that’s liable to cause pivotal events’ does make sense to me and feels very useful, because I think AGI gets you both the world-saving and the world-destroying capabilities in one fell swoop (though not necessarily the ability to align AGI to actually utilize the capabilities you want). But given my beliefs about AGI, I’m satisfied with just using the term ‘AGI’ to refer to ‘the kind of AI that’s liable to cause pivotal events’. Eliezer’s more-specifically-about-pivotal-events term for this on Arbital, ‘advanced agent’, seems fine to me too.)
    - Rob Bensinger 16 Nov 2021 0:24 UTC
      8 points
      Parent
      Update: Eliezer has agreed to let me edit the Arbital article to follow more standard usage nowadays, with ‘pivotal acts’ referring to good gameboard-flipping actions. The article will use ‘existential catastrophe’ to refer to bad gameboard-flipping events, and ‘astronomically significant event’ to refer to the superset. Will re-quote the article here once there’s a new version.
      - Rob Bensinger 17 Nov 2021 4:54 UTC
        5 points
        Parent
        New “pivotal act” page:
        The term ‘pivotal act’ in the context of AI alignment theory is a guarded term to refer to actions that will make a large positive difference a billion years later. Synonyms include ‘pivotal achievement’ and ‘astronomical achievement’.
        We can contrast this with existential catastrophes (or ‘x-catastrophes’), events that will make a large negative difference a billion years later. Collectively, this page will refer to pivotal acts and existential catastrophes as astronomically significant events (or ‘a-events’).
        ‘Pivotal event’ is a deprecated term for referring to astronomically significant events, and ‘pivotal catastrophe’ is a deprecated term for existential catastrophes. ‘Pivotal’ was originally used to refer to the superset (a-events), but AI alignment researchers kept running into the problem of lacking a crisp way to talk about ‘winning’ actions in particular, and their distinctive features.
        Usage has therefore shifted such that (as of late 2021) researchers use ‘pivotal’ and ‘pivotal act’ to refer to good events that upset the current gameboard—events that decisively settle a win, or drastically increase the probability of a win.
  - Logan Zoellner 15 Nov 2021 22:59 UTC
    2 points
    Parent
    Under this definition, it seems that “nuke every fab on Earth” would qualify as “borderline”, and every outcome that is both “pivotal” and “good” depends on solving the alignment problem.
- Raemon 15 Nov 2021 22:35 UTC
  6 points
  Parent
  Pivotal in this case is a technical term (whose article opens with an explicit bid for people not to stretch the definition of the term). It’s not (by definition) limited to ‘solving the alignment problem’, but there are constraints on what counts as pivotal.
- Eliezer Yudkowsky 16 Nov 2021 1:08 UTC
  4 points
  Parent
  If you can deploy nanomachines that melt all the GPU farms and prevent any new systems with more than 1 networked GPU from being constructed, that counts. That really actually suspends AGI development indefinitely pending an unlock, and not just for a brief spasmodic costly delay.
  - Wei Dai 27 Nov 2021 21:54 UTC
    8 points
    Parent
    Can you please clarify:
    
    Are you expecting that the team behind the “melt all GPU farms” pivotal act to be backed by a major government or coalition of governments?
    If not, I expect that the team and its AGI will be arrested/confiscated by the nearest authority as soon as the pivotal act occurs, and forced by them to apply the AGI to other goals. Do you see things happening differently, or expect things to come out well despite this?
    - Eliezer Yudkowsky 28 Nov 2021 5:33 UTC
      6 points
      Parent
      “Melt all GPUs” is indeed an unrealistic pivotal act—which is why I talk about it, since like any pivotal act it is outside the Overton Window, and then if any children get indignant about the prospect of doing something other than letting the world end miserably, I get to explain the child-reassuring reasons why you would never do the particular thing of “melt all GPUs” in real life. In this case, the reassuring reason is that deploying open-air nanomachines to operate over Earth is a huge alignment problem, that is, relatively huger than the least difficult pivotal act I can currently see.
      That said, if unreasonably-hypothetically you can give your AI enough of a utility function and have it deploy enough intelligence to create nanomachines that safely move through the open-ended environment of Earth’s surface, avoiding bacteria and not damaging any humans or vital infrastructure, in order to surveil all of Earth and find the GPU farms and then melt them all, it’s probably not very much harder to tell those nanomachines to melt other things, or demonstrate the credibly threatening ability to do so.
      That said, I indeed don’t see how we sociologically get into this position in a realistic way, in anything like the current world, even assuming away the alignment problem. Unless Demis Hassabis suddenly executes an emergency pact with the Singaporean government, or something else I have trouble visualizing? I don’t see any of the current owners or local governments of the big AI labs knowingly going along with any pivotal act executed deliberately (though I expect them to think it’s just fine to keep cranking up the dial on an AI until it destroys the world, so long as it looks like it’s not being done on purpose).
      It is indeed the case that, conditional on the alignment problem being solvable, there’s a further sociological problem—which looks a lot less impossible, but which I do not actually know how to solve—wherein you then have to do something pivotal, and there’s no grownups in government in charge who would understand why that was something necessary to do. But it’s definitely a lot easier to imagine Demis forming a siloed team or executing an emergency pact with Singapore, than it is to see how you would safely align the AI that does it. And yes, the difficulty of any pivotal act to stabilize the Earth includes the difficulty of what you had to do, before or after you had sufficiently powerful AGI, in order to execute that act and then prevent things from falling over immediately afterwards.
      - Wei Dai 28 Nov 2021 20:59 UTC
        6 points
        Parent
        
        the least difficult pivotal act I can currently see.
        
        Do you have a plan to communicate the content of this to people whom it would be beneficial to communicate to? E.g., write about it in some deniable way, or should such people just ask you about it privately? Or more generally, how do you think that discussions / intellectual progress on this topic should go?
        
        Do you think the least difficult pivotal act you currently see has sociopolitical problems that are similar to “melt all GPUs”?
        
        That said, I indeed don’t see how we sociologically get into this position in a realistic way, in anything like the current world, even assuming away the alignment problem.
        
        Thanks for the clarification. I suggest mentioning this more often (like in the Arbital page), as I previously didn’t think that your version of “pivotal act” had a significant sociopolitical component. If this kind of pivotal act is indeed how the world gets saved (conditional on the world being saved), one of my concerns is that “a miracle occurs” and the alignment problem gets solved, but the sociopolitical problem doesn’t because nobody was working on it (even if it’s easier in some sense).
        
        But it’s definitely a lot easier to imagine Demis forming a siloed team or executing an emergency pact with Singapore
        
        (Not a high priority to discuss this here and now, but) I’m skeptical that backing by a small government like Singapore is sufficient, since any number of major governments would be very tempted to grab the AGI(+team) from the small government, and the small government will be under tremendous legal and diplomatic stress from having nonconsensually destroyed a lot of very valuable other people’s property. Having a partially aligned/alignable AGI in the hands of a small, geopolitically weak government seems like a pretty precarious state.
        Eliezer Yudkowsky 29 Nov 2021 6:06 UTC
        6 points
        Parent
        Singapore probably looks a lot less attractive to threaten if it’s allied with another world power that can find and melt arbitrary objects.
  - Vaniver 16 Nov 2021 2:02 UTC
    4 points
    Parent
    I’m still unsure how true I think this is.
    Clearly a full Butlerian jihad (where all of the computers are destroyed) suspends AGI development indefinitely, and destroying no computers doesn’t slow it down at all. There’s a curve then where the more computers you destroy, the more you both 1) slow down AGI development and 2) disrupt the economy (since people were using those to keep their supply chains going, organize the economy, do lots of useful work, play video games, etc.).
    But even if you melt all the GPUs, I think you have two obstacles:
    CPUs alone can do lots of the same stuff. There’s some paper I was thinking of from ~5 years ago where they managed to get a CPU farm competitive with the GPUs of the time, and it might have been this paper (whose authors are all from Intel, who presumably have a significant bias) or it might have been the Hogwild-descended stuff (like this); hopefully someone knows something more up to date.
    The chip design ecosystem gets to react to your ubiquitous nanobots and reverse-engineer what features they’re looking for to distinguish between whitelisted CPUs and blacklisted GPUs; they may be able to design a ML accelerator that fools the nanomachines. (Something that’s robust to countermoves might have to eliminate many more current chips.)
    - Eliezer Yudkowsky 16 Nov 2021 3:38 UTC
      6 points
      Parent
      I agree you might need to make additional moves to keep the table flipped, but in a scenario like this you would actually have the capability to make those moves.
      - Logan Zoellner 16 Nov 2021 14:59 UTC
        7 points
        Parent
        Is the plan just to destroy all computers with say >1e15 flops of computing power? How does the nanobot swarm know what a “computer” is? What do you do about something like GPT-neo or SETI-at-home where the compute is distributed?
        I’m still confused as to why you think task: “build an AI that destroys anything with >1e15 flops of computing power—except humans, of course” would be dramatically easier than the alignment problem.
        Setting back civilization a generation (via catastrophe) seems relatively straightforward. Building a social consensus/religion that destroys anything “in the image of a mind” at least seems possible. Fine-tuning a nanobot swarm to destroy some but not all computers just sounds really hard to me.

Logan Zoellner comments on Ngo and Yudkowsky on alignment difficulty

Examples of pivotal and non-pivotal events

Centrality to limited AI proposals

Centrality to concept of ‘advanced agent’