AGI ruin mostly rests on strong claims about alignment and deployment, not about society

Dustin Moskovitz writes on Twitter:

My intuition is that MIRI’s argument is almost more about sociology than computer science/security (though there is a relationship). People won’t react until it is too late, they won’t give up positive rewards to mitigate risk, they won’t coordinate, the govt is feckless, etc.
And that’s a big part of why it seems overconfident to people, bc sociology is not predictable, or at least isn’t believed to be.

And Stefan Schubert writes:

I think it’s good @robbensinger wrote a list of reasons he expects AGI ruin. It’s well-written.
But it’s notable and symptomatic that ⁹⁄₁₀ reasons relate to the nature of AI systems and only ¹⁄₁₀ (discussed in less detail) to the societal response.

https://www.lesswrong.com/posts/eaDCgdkbsfGqpWazi/the-basic-reasons-i-expect-agi-ruin
Whatever one thinks the societal response will be, it seems like a key determinant of whether there’ll be AGI ruin.
Imo the debate on whether AGI will lead to ruin systematically underemphasises this factor, focusing on technical issues.
It’s useful to distinguish between warnings and all-things-considered predictions in this regard.
When issuing warnings, it makes sense to focus on the technology itself. Warnings aim to elicit a societal response, not predict it.
https://www.lesswrong.com/posts/gEShPto3F2aDdT3RY/sleepwalk-bias-self-defeating-predictions-and-existential
But when you actually try to predict what’ll happen all-things-considered, you need to take the societal response into account in a big way
As such I think Rob’s list is better as a list of reasons we ought to take AGI risk seriously, than as a list of reasons it’ll lead to ruin

My reply is:

It’s true that in my “top ten reasons I expect AGI ruin” list, only one of the sections is about the social response to AGI risk, and it’s a short section. But the section links to some more detailed discussions (and quotes from them in a long footnote):

Also, discussing the adequacy of society’s response before I’ve discussed AGI itself at length doesn’t really work, I think, because I need to argue for what kind of response is warranted before I can start arguing that humanity is putting insufficient effort into the problem.

If you think the alignment problem itself is easy, then I can cite all the evidence in the world regarding “very few people are working on alignment” and it won’t matter.

If you think a slowdown is unnecessary or counterproductive, then I can point out that governments haven’t placed a ceiling on large training runs and you’ll just go “So? Why should they?”

Society’s response can only be inadequate given some model of what’s required for adequacy. That’s a lot of why I factor out that discussion into other posts.^[1]

More importantly, contra Dustin, I don’t see myself as having strong priors or complicated models regarding the social situation.

Eliezer Yudkowsky similarly says he doesn’t have strong predictions about what governments or communities will do in this or that situation (beyond anti-predictions like “they probably won’t do specific thing X that’s wildly different from anything they’ve done before”):

[Ngo][12:26]
The other thing is that, for pedagogical purposes, I think it’d be useful for you to express some of your beliefs about how governments will respond to AI
I think I have a rough guess about what those beliefs are, but even if I’m right, not everyone who reads this transcript will be
[Yudkowsky][12:28]
Why would I be expected to know that? I could talk about weak defaults and iterate through an unending list of possibilities.
Thinking that Eliezer thinks he knows that to any degree of specificity feels like I’m being weakmanned!
[Ngo][12:28]
I’m not claiming you have any specific beliefs
[Yudkowsky][12:29]
I suppose I have skepticism when other people dream up elaborately positive and beneficial reactions apparently drawn from some alternate nicer political universe that had an absolutely different response to Covid-19, and so on.
[Ngo][12:29]
But I’d guess that your models rule out, for instance, the US and China deeply cooperating on AI before it’s caused any disasters
[Yudkowsky][12:30]
“Deeply”? Sure. That sounds like something that has never happened, and I’m generically skeptical about political things that go better than any political thing has ever gone before.

I don’t feel pessimistic about society across all domains, I don’t think most tech or scientific progress is at all dangerous or bad, etc. It’s mostly just that AGI looks like a super unusual and hard problem to me.

To imagine civilization behaving really unusually and doing something a lot harder than it’s ever done, I need strong predictive models saying why civilization will do those things. Adequate strategies are conjunctive; I don’t need special knowledge to predict “not that”.

It’s true that this requires a bare minimum model of civilization saying that we aren’t a sane, coordinated super-agent that just handles problems whenever there’s something important to do.

If humanity did consistently strategically scale its efforts with the difficulty and importance of problems in the world (even when weird and abstract analysis is required to see how hard and important the problem is), then I would expect us to just flexibly scale up our efforts and modify all our old heuristics in response to the alignment problem.^[2]

So I’m at least making the anti-prediction “civilization isn’t specifically like that”.

Example: I don’t in fact see my high p(doom) as resting on a strong assumption about whether people will panic and ban a bunch of AI things. My high level of concern is predicated on a reasonable amount of uncertainty about whether that will happen.

The issue is that “people panic and ban things”, while potentially helpful on the margin, does not consistently save the world and cause the long-term future to go well (and there’s a nontrivial number of worlds where it makes things worse on net). The same issue of aligning and wielding powerful tech has to be addressed anyway.

Maybe panic buys us another 5 years, optimistically; maybe it even buys us 20, amazingly. But if superintelligence comes in 2055 rather than 2035, I still very much expect catastrophe. So possibilities like this don’t strongly shift the set of worlds I expect to see toward optimistic outcomes.

Stefan replies on Twitter:

Thanks, Rob, this is helpful.
I do actually think you should put the kinds of arguments you give here [...] in posts like this, since “people will rise to the occasion” seems like one of the key counter-argument to your views; so it seems central to rebut that.
I also think there’s some tension between being uncertain about what the societal response will be and being relatively certain of doom. (Though it depends on the levels of un/certainty.)
I think many would give the simple argument:
P1: Whether there’ll be AI doom depends on the societal response
P2: It’s uncertain what the societal response will be
C: It’s uncertain whether there’ll be AI doom (so P(doom) isn’t very high)
Could be good to address that head on

There’s of course tension! Indeed, I’d phrase it more strongly than that: uncertainty about the societal response is one of the largest reasons I still have any hope for the future. It’s one of the main factors pushing against high p(doom), on my model.

“We don’t know exactly how hard alignment is, and in the end it’s just a technical problem” is plausibly an even larger factor. It’s easier to get clear data about humanity’s coordination ability than to get clear data about how hard alignment is: we have huge amounts of direct observational data about how humans and nations tend to behave, whereas no amount of failed work can rule out the possibility that someone will come up with a brilliant new alignment approach tomorrow that just works.

That said, there are enough visible obstacles to alignment, and enough failed attempts have been made at this point, that I’m willing to strongly bet against a miracle solution occurring (while working to try to prove myself wrong about this).

“Maybe society will coordinate to do something miraculous” and “maybe we’ll find a miraculously effective alignment solution” are possibilities that push in the direction of hope, but they don’t strike me as likely in absolute terms.

The reason “maybe society will do something miraculous” seems unlikely to me is mostly just because the scale of the required miracle seems very large to me.

This is because:

I think it’s very likely that we’ll need to solve both the alignment problem and the deployment problem in order to see good outcomes.
It seems to me that these two problems both require getting a large number of things right, and some of these things seem very hard, and/or seem to require us to approach the problem is very novel and unusual ways.

AGI Ruin and Capabilities Generalization, and the Sharp Left Turn make the case for the alignment problem seeming difficult and/or out-of-scope for business-as-usual machine learning.

“Pivotal acts seem hard” and “there isn’t a business-as-usual way to prevent AGI tech from proliferating and killing everyone” illustrate why the deployment problem seems difficult and/or demanding of very novel strategies, and Six Dimensions of Operational Adequacy in AGI Projects fills in a lot of the weird-or-hard details.

When we’re making a large enough ask of civilization (in terms of raw difficulty, and/or in terms of requiring civilization to go wildly off-script and do things in very different ways than it has in the past), we can have a fair amount of confidence that civilization won’t fulfill the ask even if we’re highly uncertain about the specific dynamics at work, the specific course history will take, etc.

^
It’s also not clear to me what Stefan (or Dustin) would want me to actually say about society, in summarizing my views.
In the abstract, it’s fine to say “society is very important, so it’s weird if only ¹⁄₁₀ of the items discuss society”. But I don’t want to try to give equal time to technical and social issues just for the sake of emphasizing the importance of social factors. If I’m going to add more sentences to a post, I want it to be because the specific claims I’m adding are important, unintuitive, etc. What are the crucial specifics that are missing?
^
Though if we actually lived in that world, we would have already made that observation. A sane world that nimbly adapts its policies in response to large and unusual challenges doesn’t wait until the last possible minute to snatch victory from the jaws of defeat; it gets to work on the problem too early, tries to leave itself plenty of buffer, etc.