Contra Hanson on AI Risk

Robin Hanson wrote a new post recapping his position on AI risk (LW discussion). I’ve been in the Eliezer AI-risk camp for a while, and while I have huge respect for Robin’s rationality and analytical prowess, the arguments in his latest post seem ineffective at drawing me away from the high-doom-worry position.


Robin begins (emphasis mine):

First, if past trends continue, then sometime in the next few centuries the world economy is likely to enter a transition that lasts roughly a decade, after which it may double every few months or faster, in contrast to our current fifteen year doubling time. (Doubling times have been relatively steady as innovations are typically tiny compared to the world economy.) The most likely cause for such a transition seems to be a transition to an economy dominated by artificial intelligence (AI). Perhaps in the form of brain emulations, but perhaps also in more alien forms. And within a year or two from then, another such transition to an even faster growth mode might plausibly happen.

And adds later in the post:

The roughly decade duration predicted from prior trends for the length of the next transition period seems plenty of time for today’s standard big computer system testing practices to notice alignment issues.

Robin is extrapolating from his table in Long-Term Growth As A Sequence of Exponential Modes:

I get that there’s a trend here. But I don’t get what inference rule Robin’s trend-continuation argument rests on.

Let’s say you have to predict whether dropping a single 100-megaton nuclear bomb on New York City is likely to cause complete human extinction. (For simplicity, assume it was just accidentally dropped by the US on home soil, not a war.)

As far as I know, the most reliably reality-binding kind of reasoning is mechanistic: Our predictions about what things are going to do rest on deduction from known rules and properties of causal models of those things.

We should obviously consider the causal implications of releasing 100 megatons worth of energy, and the economics of having a 300-mile-wide region wiped out.

Should we also consider that a nuclear explosion that decimates the world economy would proceed in minutes instead of years, thereby transitioning our current economic regime much faster than a decade, thus violating historical trends? I dunno, this trend-breaking seems totally irrelevant to the question of whether a singular 100-megaton nuke could cause human extinction.

Am I just not applying Robin’s trend-breaking reasoning correctly? After all, previous major human economic transitions were always leaps forward in productivity, while this scenario involves a leap backward…

Ok, but what are the rules for this trend-extrapolation approach supposed to be? I have no idea when I’m allowed to apply it.

I suspect the only way to know a rule like “don’t apply economic-era extrapolation to reason about the risk of a single bomb causing human extinction” is to first cheat and analyze the situation using purely mechanistic reasoning. After that, if there’s a particular trend-extrapolation claim that feels on-topic, you can say it belongs in the mix of reasoning types that are supposedly applicable to the situation.

In our nuke example, there are two ways this could play out:

  1. If your first-pass mechanistic reasoning lands you far from what’s predicted by trend extrapolation, e.g. if it says every human on earth dies within minutes, then hey, we’re obviously talking about a freak event and not about extrapolating economic trends. Duh, economic models aren’t designed to talk about a one-off armageddon event. You have to pick the right model for the scenario you want to analyze! Can I interest you in a model of extinction events? Did you know we’re at the end of a 200-million-year cycle wherein the above-ground niche is due to be repopulated by previously underground-dwelling rodents? Now that’s a relevant trend.

  2. If your first-pass mechanistic reasoning lands you in the ballpark of a trend-extrapolation prediction, e.g. if it says that the main influence of a 100-megaton bomb on the economy would mostly be felt ten years after the blast event, and that economic activity would still be a thing, then you can wave in the trend-extrapolation methodology and advise that we ought to make some educated guesses about the post-blast world by reference to historical trends of human economic-era transitions.

To steel-man why trend extrapolation might ever be useful, I think back to the inside/​outside view debates, like the famous case where your (biased) inside view of a project says you’ll finish it in a month, while the outside view says you’ll finish it in a year.

But to me, the tale of the planning fallacy is only a lesson about the value of taking compensatory action when you’re counteracting a known bias. I’m still not seeing why outside-view trend-extrapolation would be a kind of reasoning that has the power to constrain your expectations about reality in the general case.

Consider this argument:

  1. Our scientific worldview is built on the fundamental assumption that the future will be like the past

  2. Economic growth eras have proceeded at this rate in the past

  3. Ergo the next economic growth era will likely proceed that way

It’s invalid because step 1 is wrong. Scientific progress, as I understand it, is driven by mechanistic explanations, not by relating past observations to future observations by any kind of “likeness” metric. Progress comes from finding models that use fewer bits of information to predict larger categories of observations. Neither the timestamp of the observations nor their similarity to one another are directly relevant to the probability we should give to a model. I have a longer post about this here.

If I’m missing something, maybe Robin or someone else can write a more general explainer of how to operate reasoning by trend-extrapolation, and why they think it binds to reality in the general case.


Next, Robin points out that today we can, with some difficulty, keep our organizations sufficiently aligned with our values:

Coordination and control are hard [as demonstrated today’s organizations]… but even so competition between orgs keeps them tolerable. That is, we mostly keep our orgs under control. Even though, compared to individual humans, large orgs are in effect “super-intelligences”.

I’ll grant that large orgs can be said to be somewhat superintelligent in the sense that we expect AIs to be, but I think AIs are going to be much more intelligent than that. The manageable difficulty of aligning a group of humans tells us very little about the difficulty of aligning an AI whose intelligence is much greater than that of the smartest contemporary human (or human organization).

I know Robin is skeptical about the claim that a software system can rapidly blow past the point where it sees planet Earth as a blue atomic rag doll, but it’s not mentioned in this recent post, and it’s a huge crux for me.


Robin sees the problem of controlling superintelligent AI as similar to the problem of controlling an organization:

The owners of [AI organizations]… are well advised to consider how best to control such ventures… but such efforts seem most effective when based on actual experience with concrete fielded systems. For example, there was little folks could do in the year 1500 to figure out how to control 20th century orgs, weapons, or other tech. Thus as we now know very little about the details of future AI-based ventures, leaders, or systems, we should today mostly either save resources to devote to future efforts, or focus our innovation efforts on improving control of existing ventures. Such as via decision markets.

I agree that control is complicated, and that our current knowledge about how to control AIs seems very inadequate, and that a valid analogy can be made to people in 1500 trying to plan for controlling 20th-century orgs.

But today’s AI risk situation doesn’t map to anything in the year 1500 if we consider all its salient aspects together:

  1. Control is complicated

  2. The thing we’re going to need to control is likely more intelligent than a team of 100 Von Neumanns thinking at 100 subjective seconds per second

  3. If the thing comes into existence and we’re not really good at controlling it, we likely go extinct

  4. This whole existential bottleneck of a scenario is likely to happen within a decade or two of our discussion

Aspect #1 is analogous to 1500, while points #2-4 aren’t at all.

Robin presumably chose to only address aspect #1 because he doesn’t believe #2-4 are true premises, and he’s just summarizing his own beliefs, not necessarily the crux of his disagreement with doomers like me. Much of Robin’s post is thus talking past us doomers.

E.g. this paragraph in his post isn’t relevant to the crux of the doomer argument:

Bio[logical] humans [controlling future AI-powered organizations] would be culturally distant, slower, and less competent than em [whole-brain emulation] AIs. And non-em AIs could be stranger, and thus even more culturally distant… Yes, periodically some ventures would suffer the equivalent of a coup. But if, like today, each venture were only a small part of this future world, bio humans as a whole would do fine. Ems, if they exist, could do even better.

As in his book The Age of Em, he’s talking about a world where we’re in the presence of superhuman AI and we haven’t been slaughtered. If that world ever exists for someone to analyze, then I must already have been proven wrong about my most important doom claims.

Robin does have things to say about the cruxier subjects in other posts. I recall that he’s previously elaborated on why he doesn’t expect AI to foom, with reference to observed trends in the software economy and software codebases. But these didn’t make it into the scope of his latest post.


Near the end of the post, he tries to more directly address the crux of his disagreement with doomers. He gives a summary of an AI doomer view that I’d say is fairly accurate. I’d give this a passing grade on the Ideological Turing Test:

A single small AI venture might stumble across a single extremely potent innovation, which enables it to suddenly “foom”, i.e., explode in power from tiny compared to the world economy, to more powerful than the entire rest of the world put together. (Including all the other AIs.)

...

Furthermore it is possible that even though this system was, before this explosion, and like most all computer systems today, very well tested to assure that its behavior was aligned well with its owners’ goals across its domains of usage, its behavior after the explosion would be nearly maximally non-aligned. (That is, orthogonal in a high dim space.) Perhaps resulting in human extinction. The usual testing and monitoring processes would be prevented from either noticing this problem or calling a halt when it so noticed, either due to this explosion happening too fast, or due to this system creating and hiding divergent intentions from its owners prior to the explosion.

Finally, we get some arguments that seem more valid and directed at the crux of the AI doomer worldview.

Robin argues that a foom scenario violates how economic competition normally works:

This [foom] scenario requires that this [AI] venture prevent other ventures from using its key innovation during this explosive period.

But I think being superintelligent lets you create your own super-productive economy from scratch, regardless of what the human economy looks like.

Robin argues that a superintelligent-AI-powered organization would have to solve internal coordination problems much better than large human organizations do:

It also requires that this new more powerful system not only be far smarter in most all important areas, but also be extremely capable at managing its now-enormous internal coordination problems.

But I think superintelligent AI’s powers dwarf the difficulty of the challenging of coordinating itself.

Robin argues:

[The AI foom scenario] it requires that this system not be a mere tool, but a full “agent” with its own plans, goals, and actions.

But I think superintelligent systems, if they’re not agenty on the surface, have an agenty subsystem and are therefore just a small modification away from being agenty.

Robin points out the lack of any historical precedent for “one tiny part [of the world] suddenly exterminating all the rest”. But I already think an intelligence explosion is destined to be a unique event in the history of the universe.


Finally, a couple notable quotes near the end of Robin’s post that don’t seem to pass the Ideological Turing Test.

Robin mentions that we’ve had a history of wrongly predicting that AI would automate human labor:

You might think that folks would take a lesson from our history of prior bursts of anxiety and concern about automation, bursts which have appeared roughly every three decades since at least the 1930s. Each time, new impressive demos revealed unprecedented capabilities, inducing a burst of activity and discussion, with many then expressing fear that a rapid explosion might soon commence, automating all human labor. They were, of course, very wrong.

But we AI doomers don’t see this as a data point to update on. We don’t see the impact of subhuman-general-intelligence AI as being relevant to our main concern. We believe there’s a critical AI capability threshold somewhere in the ballpark of human-level intelligence where we start sliding rapidly and uncontrollably toward the attractor state where AI permanently bricks the universe. Our situation in the present is that of a spaceship nearing the event horizon of a black hole, or a pile of Uranium nearing a neutron multiplication factor (k) of greater than 1.

I was surprised to see this line because I don’t think it’s relevant at this point in the game to mention AI doomers invoking Pascal’s Wager:

Worriers often invoke a Pascal’s wager sort of calculus, wherein any tiny risk of this nightmare scenario could justify large cuts in AI progress.

The most common AI doom position, and the surveyed position of over a third of people working in the field of AI if I recall correctly, is that there’s at least a 5% chance of near-term AI existential risk, not a “tiny” chance.


My broader experience with Robin’s work is that his insights blow me away constantly. There’s just this one weird exception when he explains why AI risk isn’t that bad, and then I have the variety of confused and frustrated reactions that I’ve gone over in this post.

While it’s common for people to be skeptical about AI doom claims, I feel like Robin’s non-doomer position summarized in his post is noticeably uncommon. I rarely see anyone else support their non-doomer view using arguments similar to these. I especially don’t see people reasoning from human economic-era trends as Robin likes to do.

Of course I realize I might simply be wrong on this topic and he right. I hope at least one of us will be able to make a useful update.