Rob Bensinger comments on Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger 12 Nov 2021 3:11 UTC
4 points
If I thought ML was as likely to “go X-risk” as Eliezer seems to, then I personally would want to go for the “grab probability on timelines other than what you think of as the main one” approach
I’m not sure what you mean by “grab probability on timelines” here. I think you mean something like ‘since the mainline looks doomy, try to increase P(success) in non-mainline scenarios’.
Which sounds similar to the Eliezer-strategy, except Eliezer seems to think the most promising non-mainline scenarios are different from the ones you’re thinking about. Possibly there’s also a disagreement here related to ‘Eliezer thinks there are enough different miracle-possibilities (each of which is sufficiently low-probability) that it doesn’t make sense to focus in on one of them.’
There’s a different thing you could mean by ‘grab probability on timelines other than what you think of as the main one’, which I don’t think was your meaning, that’s something like: assuming things go well, AGI is probably further in the future than Eliezer thinks. So it makes sense to focus at least somewhat more on longer-timeline scenarios, while keeping in mind that AGI probably isn’t in fact that far off.
I think MIRI leadership would endorse ‘if things went well, AGI timelines were probably surprising long’.
- jbash 12 Nov 2021 5:09 UTC
  7 points
  Parent
  
  I think you mean something like ‘since the mainline looks doomy, try to increase P(success) in non-mainline scenarios’.
  
  Yes, that’s basically right.
  
  I didn’t bring up the “main line”, and I thought I was doing a pretty credible job of following the metaphor.
  
  Take a simplified model where a final outcome can only be “good” (mostly doom-free) or “bad” (very rich in doom). There will be a single “winning” AGI, which will simply be the first to cross some threshold of capability. This cannot be permanently avoided. The winning AGI will completely determine whether the outcome is good or bad. We’ll call a friendly-aligned-safe-or-whatever AGI that creates a good outcome a “good AGI”, and one that creates a bad outcome a “bad AGI”. A randomly chosen AGI will be bad with probability 0.999.
  
  You want to influence the creation of the winning AGI to make sure it’s a good one. You have certain finite resources to apply to that: time, attention, intelligence, influence, money, whatever.
  
  Suppose that you think that there’s a 0.75 probability that something more or less like current ML systems will win (that’s the “ML timeline” and presumptively the “main line”). Unfortunately, you also believe that there’s only 0.05 probability that there’s any path at all to find a way for an AGI with an “ML architecture” to be good, within whatever time it takes for ML to win (probably there’s some correlation between how long it takes ML to win and how long it takes out to figure out how to make it good). Again, that’s the probability that it’s possible in the abstract to invent good ML in the available time, not the probability that it will actually be invented and get deployed.
  
  Contingent on the ML-based approach winning, and assuming you don’t do anything yourself, you think there’s maybe a 0.01 probability that somebody else will actually arrange for a the winning AGI to be good. You’re damned good, so if you dump all of your attention and resources into it, you can double that to 0.02 even though lots of other people are working on ML safety. So you would contribute 0.01 times 0.75 or 0.0075 probability to a good outcome. Or at least you hope you would; you do not at this time have any idea how to actually go about it.
  
  Now suppose that there’s some other AGI approach, call it X. X could also be a family of approaches. You think that X has, say, 0.1 probability of actually winning instead of ML (which leaves 0.15 for outcomes that are neither X nor ML). But you think that X is more tractable than ML; there’s a 0.75 probability that X can in principle be made good before it wins.
  
  Contingent on X winning, there’s a 0.1 probability that somebody else will arrange for X to be good without you. But at the moment everybody is working on ML, which gives you runway to work on X before capability on the X track starts to rise. So with all of your resources, you could really increase the overall attention being paid to X, and raise that to 0.3. You would then have contributed 0.2 times 0.1 or 0.02 probability to a good outcome. And you have at least a vague idea how to make progress on the problem, which is going to be good for your morale.
  
  Or maybe there’s a Y that only has a 0.05 probability of winning, but you have some nifty and unique idea that you think has a 0.9 probability of making Y good, so you can get nearly 0.045 even though Y is itself an unlikely winner.
  
  Obviously these are sensitive to the particular probabilities you assign, and I am not really very well placed to assign such probabilities, but my intuition is that there are going to be productive Xs and Ys out there.
  
  I may be biased by the fact that, to whatever degree I can assign priorities, I think that ML’s probability of winning, in the very manichean sense I’ve set up here, where it remakes the whole world, is more like 0.25 than 0.75. But even if it’s 0.75, which I suspect is closer to what Eliezer thinks (and would be most of his “0.85 by 2070”), ML is still handicapped by there not being any obvious way to apply resources to it.
  
  Sure, you can split your resources. And that might make sense if there’s low hanging fruit on one or more branches. But I didn’t see anything in that transcript that suggested doing that. And you would still want to put the most resources on the most productive paths, rather than concentrating on a moon shot to fix ML when that doesn’t seem doable.
  - Rob Bensinger 12 Nov 2021 6:06 UTC
    4 points
    Parent
    which I suspect is closer to what Eliezer thinks (and would be most of his “0.85 by 2070”)
    0.85 by 2070 was Nate Soares’ probability, not Eliezer’s.