AI Fire Alarm Scenarios

Link post

I’ve been pondering whether we’ll get any further warnings about when AI(s) will exceed human levels at general-purpose tasks, and that doing so would entail enough risk that AI researchers ought to take some precautions. I feel pretty uncertain about this.

I haven’t even been able to make useful progress at clarifying what I mean by that threshold of general intelligence.

As a weak substitute, I’ve brainstormed a bunch of scenarios describing not-obviously-wrong ways in which people might notice, or fail to notice, that AI is transforming the world.

I’ve given probabilities for each scenario, which I’ve pulled out of my ass and don’t plan to defend.

These probabilities add up to more than 100% - they’re not mutually exclusive. I’ll estimate an additional 15% chance that we get a fire alarm some other way, and a 25% chance that we get neither an alarm nor an interesting story about what prevented the alarm.

What do I mean by “a fire alarm”? Roughly, something that causes 90% of AI researchers to sound more worried than Brian Christian currently sounds.

Please don’t infer too much from this post about the importance of fire alarms. Katja has written an appropriately nuanced evaluation of fire alarms—most people should read just the conclusions. Also, I’m not predicting whether an AI fire alarm would reduce the risks associated with AI.

I’ll start with some relatively mundane scenarios, and build up to stranger ones.

Expert Opinion

New AI researchers and/or research teams come along every couple of years with impressive enough new accomplishments that they get recognized by most AI researchers as one of the top three or four most promising sources of progress toward AI. The world gradually notices that each of these new leaders express at least as much concern about the risks of AI as prior researchers, and that they increasingly agree that AI will pass human levels sometime in the 2030s.

By the late 2020s, there’s a clear enough consensus among the recognized AI experts that no major news organization is willing to deny that AI represents an important source of near-term risk.

Probability: 15%.

Venture Capital

Venture capitalists have above average competence at noticing new technologies, and they’re relatively alert to low probability outcomes.

I find it easy to imagine that a couple of years before we get AGI, we’ll see VCs making multi-billion dollar investments in brand new AI companies, and large tech companies occasionally acquiring them for tens of billions.

That doesn’t mean that if we see such a pattern we should believe there’s more than a 50% chance of AGI within two years—false alarms are quite possible. It does mean that many people will get excited about that possibility, even if it’s only for the purpose of taking money from VCs.

Probability: 20% chance of a valid alarm, and a 15% chance of a false alarm.

Corporate Budgets

Companies such as DeepMind and OpenAI might throw around compute budgets of, say, $100 billion in one year.

Probability: 20% chance of a valid alarm, and a 10% chance of a false alarm.

Military Arms Race

Military leaders become convinced that human-level AI is 5-10 years away, given a Manhattan Project-like effort, and that it will give an important military advantage to any country that achieves it.

This likely involves widespread expert agreement that throwing lots of compute at the problem will be an important factor in succeeding.

I’d prefer to have no fire alarm than to have this kind of alarm.

Probability: 8%.

Project Apollo Analogy

This scenario resembles the prior one, but leading nations are more clearly peaceful, and they’re more optimistic that they will be able to copy (or otherwise benefit from?) the first AI.

A leading nation (let’s say Canada) makes a massive AI project that hires the best and the brightest, and devotes more money to compute than could reasonably be expected from any other organization that the best and brightest would be willing to work for.

This is almost by definition a scenario in which we get an alarm that something of historic importance is imminent, but in which influential people mostly downplay the risks.

Probability: 5%.

The Wright Brothers Analogy

H.G. Wells’ 1907 book The War in the Air predicted that flying machines would be built. Although he knew a lot about attempts to build them, Wells didn’t know that the Wright brothers had succeeded in 1903. There were occasional published reports of the Wright brothers success, but the shortage of credible witnesses left most authorities doubting their substance. Who knows how many other inventors had flown by then without getting credit?

I presume a fair amount of that is due to the long litany of failed approaches, which left most people tired of trying to evaluate claims that needed unusually favorable conditions to replicate.

AI will have a very different set of evaluation problems, but they might produce a similar result.

In this scenario, I imagine that any individual AI will take years to incorporate sufficiently broad and general knowledge to match the best humans, much as human children do. Thus there will be a significant period when one or more AI’s are able to exceed human abilities in a moderate number of fields, and that number of fields is growing at a moderate and hard to measure pace.

I expect this scenario would produce massive confusion about whether to be alarmed.

Probability: 15%.

Worms

The smoke surrounding the OpenAI Codex may be coming from an important fire.

The novel Daemon hints at the risks of automated malware.

Stuxnet-like attacks could become more common if software can productively rewrite itself. Some of these will mistakenly cause more effects than intended.

My guess is that the maliciousness will be blamed more on the humans who create such worms than on any AIs that are involved.

Probability: 10% chance of this causing an AI fire alarm, 60% chance that major blow-ups will happen without the resulting alarms becoming focused on AI.

Conspicuously Malicious AI Assistants

Paul Christiano suggests: “AI employees will embezzle funds, AI assistants will threaten and manipulate their users, AI soldiers will desert. … The base rate will remain low but there will be periodic high-profile blow-ups.”

Subtle versions of this will be common, and not different enough from current social media manipulation to cause much more alarm than we’ve already got. I’m not willing to bet on how blatant the scariest cases will be.

Many cases will be like racist chatbots: all of the harm can plausibly be blamed on malicious people. Nevertheless, ‘Satya Nadella, the CEO of Microsoft, said that Tay “has had a great influence on how Microsoft is approaching AI,”’.

We’ll have robocars taking passengers on routes that aren’t what the passengers want. Maybe taking overly long routes that avoid passing donut shops that might tempt the passengers. Maybe taking them on routes that pass by homeless camps to raise their consciousness. But I don’t quite see this kind of example leading up to a major blow-up.

Maybe something like Alexa will buy more addictive junk food than users want?

I expect we’ll get something like full-fledged AI employees causing a few spectacular problems, but not far enough in advance of full human-level AI to qualify as much of a warning.

Financial assistants may buy into ponzi schemes. I’m guessing this won’t cause much alarm, due to the difficulty of distinguishing between incompetence and misalignment.

I expect many borderline cases in this area. I can’t quite see how much alarm they’ll cause.

Probability: 30% chance that blow-ups like this will cause a fire alarm, and an additional 30% chance that several such blow-ups will get a good deal of publicity without generating a clear fire alarm.

Free Guy

An NPC from a video game does something which has large effects on the business that owns the game. Maybe bullying players, maybe exposing the company’s trade secrets, maybe showing players how to get similar or better entertainment at lower cost. Conceivably going on strike for better NPC working conditions.

This scenario is inspired by the movie Free Guy, which significantly exceeded my expectations.

Probability: 8%.

Age of Em

Mind uploading might happen before human level AI. How would that affect expectations?

Many people will react by saying this proves human intelligence has unique, inscrutable features which AI researchers can’t replicate. But I don’t foresee that having much effect on beliefs of AI researchers, so I’ll treat that as not too important.

It will speed up some approaches to AI development, but will also provide some evidence that intelligence is messy, in a way that resists fast improvement. I’ll guess these two factors offset each other, producing little net change in expectations.

It will refute some beliefs of the form “intelligence requires more computing power than we can afford”. How much that changes beliefs will depend a lot on how fast the uploads run.

It will cause a significant alarm about the threat of digital intelligence outcompeting existing workers. It might cause AI researchers to worry about their own jobs. That will likely be the main way in which this scenario increases concern about AI risk. That would likely focus the alarm on non-existential AI risks.

Probability of uploads causing AI alarm: 8%.

AI Politicians

The world panics when an AI unexpectedly gets elected as president of the US, on a platform of a welfare system that seems generous enough to convince most humans to quit their jobs. The S&P 500 drops 50% in 2 weeks, as half of all businesses can’t figure out whether they’ll need to shut down for lack of workers.

Yet three months later, the new welfare system is only attractive enough to convince 3% of US workers to quit. Businesses resume their normal spending, and the US returns to the usual disputes, with 50% of the country insisting that racism is the root of all evil, and 45% insisting that the biggest evil is AIs threatening human dignity by manipulating us into thinking they’re US citizens.

A handful of eccentric people argue about whether the new president is smarter than humans, and also about whether she is smarter than other well-known AIs. But they’re largely dismissed, on the grounds that they believe smartness can be measured by IQ, which is clearly a racist affront to human dignity.

Probability: 0.1%.

AI’s Manipulate Politics

The previous scenario suffers from an obvious obstacle involved with altering the relevant eligibility rules for becoming president.

So let’s modify it to have the AI act as an advisor to a politician. Let’s say that the next Dominic Cummings is secretly an AI. We won’t get good information about how much influence AICummings will have on political decisions, so I expect more confusion than in the previous scenario.

The people who first express concern about this will likely be branded as conspiracy theorists, and banned from Twitter and Facebook. Maybe the AIs will create accounts that sound like crackpots in order to ensure that we expect these concerns to be mistaken.

I don’t expect that to fully suppress awareness of the problems, as the politicians who use these AIs will occasionally help their allies to understand how to use the AIs. The AIs or their owners will also spread awareness to politicians who might want to purchase these services.

The amount of money spent on resulting arms races between politicians will likely spread awareness that something interesting is happening, but I expect enough secrecy, smoke, and mirrors that most people, including most AI experts, will not see a clear enough pattern to generate anything that I’d call a fire alarm.

Probability: 5%.

Turing Test Alarm, Snooze Button Edition

GPT-7 passes a Turing test, but isn’t smart enough to do anything dangerous. People interpret that as evidence that AI is safe.

I don’t mean the half-assed Turing tests that are typically performed. I’m thinking of a test where the judges have been selected for skill at judging this type of contest, with large enough rewards for correct answers that the test will attract competent judges.

Or maybe the test consists of having GPT-7, disguised as a human, apply for remote jobs at leading tech companies. In this scenario, GPT-7 gets jobs about as often as humans do, but is rarely able to hold them for more than a few weeks.

If GPT-7 is strongly hyped before these results are known, then policy makers, and maybe some of the less sophisticated AI developers, will take that as proof that humans have unique abilities that software can’t match.

Probability: 5%.