Wizards and prophets of AI [draft for comment]

[Written for a general audience. You can probably skip the first section. Posted for feedback/​comment before publication on The Roots of Progress. Decided not to publish as-is, although parts of this have been or may be used in other essays.]

Will AI kill us all?

That question is being debated seriously by many smart people at the moment. Following Charles Mann, I’ll call them the wizards and the prophets: the prophets think that the risk from AI is so great that we should actively slow or stop progress on it; the wizards disagree.

Why even discuss this?

(If you are already very interested in this topic, you can skip this section.)

Some of my readers will be relieved that I am finally addressing AI risk. Others will think that an AI apocalypse is classic hysterical pessimist doomerism, and they will wonder why I am even dignifying it with a response, let alone taking it seriously.

A few reasons:

It’s important to take safety seriously

Safety is a value. New technologies really do create risk, and the more powerful we get, the bigger the risk. Making technology safer is a part of progress, and we should celebrate it. Doomer pessimism is generally wrong, but so is complacent optimism. We should be prescriptive, not descriptive optimists, embracing solutionism over complacency.

We shouldn’t dismiss arguments based on vibes

Or mood affiliation, or who is making the argument, or what kind of philosophy they seem to be coming from. Our goal is to see the truth clearly. And the fact that doomer arguments always been wrong doesn’t mean that this one is.

The AI prophets are not typical doomers

They are generally pro-technology, pro-human, and not fatalistic. Nor are they prone to authoritarianism; many lean libertarian. And their arguments are intelligent and thoroughly thought-out.

Many of the arguments against them are bad

Many people (not mentioned in this post) are not thinking clearly and are being fairly sloppy.

So I want to address this.

The argument

I boil it down to three main claims:

AI will become a superintelligent agent

It will be far smarter than any human being, quantitatively if not qualitatively. And some forms of the AI will have goal-directed behavior.

This does not require computers to be conscious (merely that they be able to do things that right now only conscious beings can do). It does not require them to have a qualitatively different form of “intelligence”: it could be enough for them to be as smart as a brilliant human, able to read everything ever written and have perfect recall of it, able to think 1000x faster, able to fork into teams that work on things simultaneously, etc.

The AI’s goals will not be aligned with ours

This is the principal-agent problem again. Whatever it is aiming at will not be exactly what we want. We won’t be able to give it perfect instructions. We will not be able to train it to obey the law. We won’t even be able to train it to follow basic human morality, like “don’t kill everyone.”

This does not require it to have free will to choose its goals, or otherwise to depart from following the training we have given it. Like a genie or a monkey’s paw, it might do exactly what we ask for, in a way that is not at all what we wanted—following the letter of our instructions, but destroying the spirit.

All our prevention and countermeasures will fail

If we test AI in a box before letting it out into the real world, our tests will miss crucial problems. If we try to keep it in a box forever, it will talk its way out (and by the way, we’re not even trying to do that). If we try to limit the AI’s power, it will evade those limitations. If we try to turn it off, it will stop us. If we try to use some AIs as police to watch the other AIs, they will instead collude with each other and conspire against us. In fact, it might anticipate all of the above and conclude that the easiest path is just to launch a sneak attack on humanity and kill us all to get us out of the way.

And whatever happens might happen so fast that we don’t get a chance to learn from failure. There will be no Hindenberg or Tacoma Narrows Bridge or Chernobyl as a cautionary example. There will be no warning shot, no failed robot uprising. The very first time AI takes action against us, it will wipe us all out.

Analogies

In “Four lenses on AI risks”, I gave the analogy that AI might be like expansionary Western empires when they clashed with other civilizations, or like humans when they arrived on the evolutionary scene, wiping out the Neanderthals and hunting many megafauna to extinction.

A related argument is that if you would be worried about an advanced alien civilization coming to Earth, you should worry about AI.

What’s different this time

People have always been worried that new technologies would cause catastrophe. But so far, technology has done far more good than harm overall. What might be different this time?

Related, why worry about AI instead of an asteroid impact, an antibiotic-resistant superbug, etc.?

The crux is the power of intelligence. Humans have been able so far to overcome every challenge because of the power of our intelligence. We can beat natural disasters: drought and famine, storm and flood. We can beat wild animals. We can beat bacteria and viruses. We can make cars, planes, drugs, and X-rays safe. Nature is no match for us because intelligence trumps everything. David Deutsch says that “anything not forbidden by the laws of nature is achievable, given the right knowledge.”

If AI goes rogue, we are for the first time up against an intelligent adversary. We’re not mastering indifferent nature; we’re potentially up against something that has a world-model, that can create and execute plans.

Arguably, the more optimistic you are about the ability of humans to overcome any challenge, the more worried you should be about any non-human thing gaining that same ability.

The crux is epistemic

Why do smart people disagree so much on this?

Eliezer is certain we are doomed. Zvi thinks it’s very likely. Scott Alexander gives it a 33% chance (which means we still have a 23 chance to survive!) On the other hand, Scott Aaronson implies that his probability is under 2%; Tyler Cowen says that we just can’t know, Pinker is dismissive of all the arguments.

I think the deepest crux here is epistemological: how well do we understand this issue, how much can we say about it, and what can we predict?

The prophets think that, based on the nature of intelligence, the entire argument above is obviously correct. Most of the argument can be boiled down to a simple syllogism: the superior intelligence is always in control; as soon as AI is more intelligent than we are, we are no longer in control.

The wizards think that we are more in a realm of Knightian uncertainty. There are too many unknown unknowns. We can’t make any confident projections of what will happen. Any attempt to do so is highly speculative. If we were to give equal weight to all hypotheses with equal evidence, there would be a epistemically unmanageable combinatorial explosion of scenarios to consider.

There is then a further disagreement about how to talk about such scenarios. Adherents of Bayesian epistemology want to put a probability on everything, no matter how far removed from evidence. Neo-Popperians like David Deutsch think that even suggesting such probabilities is irrational, that attempting inferences beyond the “reach” of our best explanations is unwarranted—appropriately, the term Popper used for this was “prophecy.”

Eliezer thinks that this is like orbital mechanics: we see an asteroid way out in the distance, we calculate its trajectory, we know from physics that it is going to destroy the Earth.

Why I’m skeptical of the prophecy

Orbital mechanics is very simple and well-understood. The situation with AI is complex and poorly understood.

What could a superintelligence really do? The prophets’ answer seems to be “pretty much anything.” Any sci-fi scenario you can imagine, like “diamondoid bacteria that infect all humans, then simultaneously release botulinum toxin.” In this view, as intelligence increases without limit, it approaches omnipotence. But this is not at all obvious to me.

The same view is behind the argument that all our prevention and countermeasures will fail: the AI will outsmart you, manipulate you, outmaneuver you, etc. As Scott Aaronson points out, this is a “fully general counterargument” to anything that might work.

When we think about Western empires or alien invasions, what makes one side superior is not raw intelligence, but the results of that intelligence compounded over time, in the form of science, technology, infrastructure, and wealth. Similarly, an unaided human is no match for most animals. AI, no matter how intelligent, will not start out with a compounding advantage.

Similarly, will we really have no ability to learn from mistakes? One of the prophets’ worries is “fast takeoff”, the idea that AI progress could go from ordinary to godlike literally overnight (perhaps through “recursive self-improvement”). But in reality, we seem to be seeing a “slow takeoff,” as some form of AI has arrived and we actually have time to talk and worry about it (even though Eliezer claims that fast takeoff has not yet been invalidated).

If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.

Proceed, with caution

We always have to act, even in the face of uncertainty—even Knightian uncertainty.

We also have to remember that the potential advantages of AI are as great as its risks. If it is as powerful as its worst critics fear, then it is also powerful enough to give us abundant clean energy, cheap manufacturing and construction, fast and safe transportation, and the cure for all disease. Remember that no matter what, we’re all going to die eventually, until and unless we cure aging itself.

If we did see an alien fleet approaching us, would we try to hide? If they weren’t even on course for us, but were going to pass us by, would we stay silent, or call out to them? Personally, I would want to meet them and to learn from them. And yes, without some evidence of hostile intent on their part, I would risk our civilization to not pass up that defining moment.

Scott Aaronson defines someone’s “Faust parameter” as “the maximum probability they’d accept of an existential catastrophe in order that we should all learn the answers to all of humanity’s greatest questions,” adding “I confess that my Faust parameter might be as high as 0.02.” I sympathize.

None of the above means “damn the torpedoes, full speed ahead.” Testing and AI safety work are all valuable. It is good to occasionally hold an Asilomar conference. It’s good to think through the safety implications of new developments before even working on them, as Kevin Esvelt did for the gene drive. We can do “reform” vs. “orthodox” AI safety. (And note that OpenAI spent several months testing GPT-4 before its release.)

So, proceed with caution. But proceed.