elifland comments on Warning Shots Probably Wouldn’t Change The Picture Much

elifland 6 Oct 2022 14:58 UTC
11 points
0
Written and forecasted quickly, numbers are very rough. Thomas requested I make a forecast before anchoring on his comment (and I also haven’t read others).
I’ll make a forecast for the question: What’s the chance a set of >=1 warning shots counterfactually tips the scales between doom and a flourishing future, conditional on a default of doom without warning shots?
We can roughly break this down into:
1. Chance >=1 warning shots happens
2. Chance alignment community / EA have a plan to react to warning shot well
3. Chance alignment community / EA have enough influence to get the plan executed
4. Chance the plan implemented tips the scales between doom and flourishing future
I’ll now give rough probabilities:
1. Chance >=1 warning shots happens: 75%
  1. My current view on takeoff is closer to Daniel Kokotajlo-esque fast-ish takeoff than Paul-esque slow takeoff. But I’d guess even in the DK world we should expect some significant warning shots, we just have less time to react to them.
  2. I’ve also updated recently toward thinking the “warning shot” doesn’t necessarily need to be that accurate of a representation of what we care about to be leveraged. As long as we have a plan ready to react to something related to making people scared of AI, it might not matter much that the warning shot accurately represented the alignment community’s biggest fears.
2. Chance alignment community / EA have a plan to react to warning shot well: 50%
  1. Scenario planning is hard, and I doubt we currently have very good plans. But I think there are a bunch of talented people working on this, and I’m planning on helping :)
3. Chance alignment community / EA have enough influence to get the plan executed: 35%
  1. I’m relatively optimistic about having some level of influence, seems to me like we’re getting more influence over time and right now we’re more bottlenecked on plans than influence. That being said, depending on how drastic the plan is we may need much more or less influence. And the best plans could potentially be quite drastic.
4. Chance the plan implemented tips the scales between doom and flourishing future, conditional on doom being default without warning shots: 5%
  1. This is obviously just a quick gut-level guess; I generally think AI risk is pretty intractable and hard to tip the scales on even though it’s super important, but I guess warning shots may open the window for pretty drastic actions conditional on (1)-(3).
Multiplying these all together gives me 0.66%, which might sound low but seems pretty high in my book as far as making a difference on AI risk is concerned.