Rob Bensinger comments on Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger 11 Nov 2021 23:18 UTC
47 points
People who don’t want to die on the main line should be doing, or planning, something other than trying to win an AGI race… like say flipping tables and trying to foment nuclear war or something.
How does fomenting nuclear war change anything? The basic logic for ‘let’s also question our assumptions and think about whether there’s some alternative option’ is sound (and I mostly like your decomposition of Eliezer’s view), but you do need to have the alternative plan actually end up solving the problem.
Specific proposals and counter-proposals (that chain all the way to ‘awesome intergalactic future’) are likely the best way to unearth cruxes and figure out what makes sense to do here. Just saying ‘let’s consider third options’ or ‘let’s flip the tables somehow’ won’t dissuade Eliezer because it’s not a specific promising-looking plan (and he thinks he’s already ruled out enough plans like this to make them even doomier than AGI-alignment-mediated plans).
Eliezer is saying ‘there isn’t an obvious path forward, so we should figure out how to best take advantage of future scenarios where there are positive model violations (“miracles”)‘; he’s not saying ‘we’re definitely doomed, let’s give up’. If you agree but think that something else gives us better/likelier miracles than trying to align AGI, then that’s a good place to focus discussion.
One reason I think Eliezer tends to be unpersuaded by alternatives to alignment is that they tend to delay the problem without solving it. Another reason is that Eliezer thinks AGI and alignment are to a large extent unknown quantities, which gives us more reason to expect positive model violations; e.g., “maybe if X happened the world’s strongest governments would suddenly set aside their differences and join in harmony to try to handle this issue in a reasonable way” also depends on positive violations of Eliezer’s model, but they’re violations of generalizations that have enormously more supporting data.
We don’t know much about how early AGI systems tend to work, or how alignment tends to work; but we know an awful lot about how human governments (and human groups, and human minds) tend to work.
- jbash 11 Nov 2021 23:50 UTC
  11 points
  Parent
  Nuclear war was just an off-the-top example meant to illustrate how far you might want to go. And I did admit that it would probably basically be a delaying tactic.
  
  If I thought ML was as likely to “go X-risk” as Eliezer seems to, then I personally would want to go for the “grab probability on timelines other than what you think of as the main one” approach, not the “nuclear war” approach. And obviously I wouldn’t treat nuclear war as the first option for flipping tables… but just as obviously I can’t come up with a better way to flip tables off the top of my head.
  
  If you did the nuclear war right, you might get hundreds or thousands of years of delay, with about the same probability [edit: meant to say “higher probability” but still indicate that it was low in absolute terms] that I (and I think Eliezer) give to your being able to control^[1] ML-based AGI. That’s not nothing. But the real point is that if you don’t think there’s a way to “flip tables”, then you’re better off just conceding the “main line” and trying to save other possibilities, even if they’re much less probable.
  ↩︎
  I don’t like the word “alignment”. It admits too many dangerous associations and interpretations. It doesn’t require them, but I think it carries a risk of distorting one’s thoughts.
  - Jackson Wagner 12 Nov 2021 1:21 UTC
    12 points
    Parent
    I think there are some ways of flipping tables that offer some hope (albeit a longshot) of actually getting us into a better position to solve the problem, rather than just delaying the issue. Basically, strategies for suppressing or controlling Earth’s supply of compute, while pressing for differential tech development on things like BCIs, brain emulation, human intelligence enhancement, etc, plus (if you can really buy lots of time) searching for alternate, easier-to-align AGI paradigms, and making improvements to social technology / institutional decisionmaking (prediction markets, voting systems, etc).
    
    I would write more about this, but I’m not sure if MIRI / LessWrong / etc want to encourage lots of public speculation about potentially divisive AGI “nonpharmaceutical interventions” like fomenting nuclear war. I think it’s an understandably sensitive area, which people would prefer to discuss privately.
    - Yitz 29 Mar 2022 19:58 UTC
      1 point
      Parent
      If discussed privately, that can also lead to pretty horrific scenarios where a small group of people do something incredibly dumb/dangerous without having outside voices to pull them away from such actions if sufficiently risky. Not sure if there is any “good” way to discuss such topics, though…
  - Rob Bensinger 12 Nov 2021 3:11 UTC
    4 points
    Parent
    If I thought ML was as likely to “go X-risk” as Eliezer seems to, then I personally would want to go for the “grab probability on timelines other than what you think of as the main one” approach
    I’m not sure what you mean by “grab probability on timelines” here. I think you mean something like ‘since the mainline looks doomy, try to increase P(success) in non-mainline scenarios’.
    Which sounds similar to the Eliezer-strategy, except Eliezer seems to think the most promising non-mainline scenarios are different from the ones you’re thinking about. Possibly there’s also a disagreement here related to ‘Eliezer thinks there are enough different miracle-possibilities (each of which is sufficiently low-probability) that it doesn’t make sense to focus in on one of them.’
    There’s a different thing you could mean by ‘grab probability on timelines other than what you think of as the main one’, which I don’t think was your meaning, that’s something like: assuming things go well, AGI is probably further in the future than Eliezer thinks. So it makes sense to focus at least somewhat more on longer-timeline scenarios, while keeping in mind that AGI probably isn’t in fact that far off.
    I think MIRI leadership would endorse ‘if things went well, AGI timelines were probably surprising long’.
    - jbash 12 Nov 2021 5:09 UTC
      7 points
      Parent
      
      I think you mean something like ‘since the mainline looks doomy, try to increase P(success) in non-mainline scenarios’.
      
      Yes, that’s basically right.
      
      I didn’t bring up the “main line”, and I thought I was doing a pretty credible job of following the metaphor.
      
      Take a simplified model where a final outcome can only be “good” (mostly doom-free) or “bad” (very rich in doom). There will be a single “winning” AGI, which will simply be the first to cross some threshold of capability. This cannot be permanently avoided. The winning AGI will completely determine whether the outcome is good or bad. We’ll call a friendly-aligned-safe-or-whatever AGI that creates a good outcome a “good AGI”, and one that creates a bad outcome a “bad AGI”. A randomly chosen AGI will be bad with probability 0.999.
      
      You want to influence the creation of the winning AGI to make sure it’s a good one. You have certain finite resources to apply to that: time, attention, intelligence, influence, money, whatever.
      
      Suppose that you think that there’s a 0.75 probability that something more or less like current ML systems will win (that’s the “ML timeline” and presumptively the “main line”). Unfortunately, you also believe that there’s only 0.05 probability that there’s any path at all to find a way for an AGI with an “ML architecture” to be good, within whatever time it takes for ML to win (probably there’s some correlation between how long it takes ML to win and how long it takes out to figure out how to make it good). Again, that’s the probability that it’s possible in the abstract to invent good ML in the available time, not the probability that it will actually be invented and get deployed.
      
      Contingent on the ML-based approach winning, and assuming you don’t do anything yourself, you think there’s maybe a 0.01 probability that somebody else will actually arrange for a the winning AGI to be good. You’re damned good, so if you dump all of your attention and resources into it, you can double that to 0.02 even though lots of other people are working on ML safety. So you would contribute 0.01 times 0.75 or 0.0075 probability to a good outcome. Or at least you hope you would; you do not at this time have any idea how to actually go about it.
      
      Now suppose that there’s some other AGI approach, call it X. X could also be a family of approaches. You think that X has, say, 0.1 probability of actually winning instead of ML (which leaves 0.15 for outcomes that are neither X nor ML). But you think that X is more tractable than ML; there’s a 0.75 probability that X can in principle be made good before it wins.
      
      Contingent on X winning, there’s a 0.1 probability that somebody else will arrange for X to be good without you. But at the moment everybody is working on ML, which gives you runway to work on X before capability on the X track starts to rise. So with all of your resources, you could really increase the overall attention being paid to X, and raise that to 0.3. You would then have contributed 0.2 times 0.1 or 0.02 probability to a good outcome. And you have at least a vague idea how to make progress on the problem, which is going to be good for your morale.
      
      Or maybe there’s a Y that only has a 0.05 probability of winning, but you have some nifty and unique idea that you think has a 0.9 probability of making Y good, so you can get nearly 0.045 even though Y is itself an unlikely winner.
      
      Obviously these are sensitive to the particular probabilities you assign, and I am not really very well placed to assign such probabilities, but my intuition is that there are going to be productive Xs and Ys out there.
      
      I may be biased by the fact that, to whatever degree I can assign priorities, I think that ML’s probability of winning, in the very manichean sense I’ve set up here, where it remakes the whole world, is more like 0.25 than 0.75. But even if it’s 0.75, which I suspect is closer to what Eliezer thinks (and would be most of his “0.85 by 2070”), ML is still handicapped by there not being any obvious way to apply resources to it.
      
      Sure, you can split your resources. And that might make sense if there’s low hanging fruit on one or more branches. But I didn’t see anything in that transcript that suggested doing that. And you would still want to put the most resources on the most productive paths, rather than concentrating on a moon shot to fix ML when that doesn’t seem doable.
      - Rob Bensinger 12 Nov 2021 6:06 UTC
        4 points
        Parent
        which I suspect is closer to what Eliezer thinks (and would be most of his “0.85 by 2070”)
        0.85 by 2070 was Nate Soares’ probability, not Eliezer’s.