I’ve been pondering whether we’ll get any further
warnings
about when AI(s) will exceed human levels at general-purpose tasks, and
that doing so would entail enough risk that AI researchers ought to take
some precautions. I feel pretty uncertain about this.
I haven’t even been able to make useful progress at clarifying what I
mean by that threshold of general intelligence.
As a weak substitute, I’ve brainstormed a bunch of scenarios describing
not-obviously-wrong ways in which people might notice, or fail to
notice, that AI is transforming the world.
I’ve given probabilities for each scenario, which I’ve pulled out of
my ass and don’t plan to defend.
These probabilities add up to more than 100% - they’re not mutually
exclusive. I’ll estimate an additional 15% chance that we get a fire
alarm some other way, and a 25% chance that we get neither an alarm nor
an interesting story about what prevented the alarm.
What do I mean by “a fire alarm”? Roughly, something that causes 90%
of AI researchers to sound more worried than Brian
Christian
currently sounds.
Please don’t infer too much from this post about the importance of fire
alarms. Katja has written an appropriately nuanced evaluation of fire
alarms—most people should read just the
conclusions.
Also, I’m not predicting whether an AI fire alarm would reduce the
risks associated with AI.
I’ll start with some relatively mundane scenarios, and build up to
stranger ones.
Expert Opinion
New AI researchers and/or research teams come along every couple of
years with impressive enough new accomplishments that they get
recognized by most AI researchers as one of the top three or four most
promising sources of progress toward AI. The world gradually notices
that each of these new leaders express at least as much concern about
the risks of AI as prior researchers, and that they increasingly agree
that AI will pass human levels sometime in the 2030s.
By the late 2020s, there’s a clear enough consensus among the
recognized AI experts that no major news organization is willing to deny
that AI represents an important source of near-term risk.
Probability: 15%.
Venture Capital
Venture capitalists have above average competence at noticing new
technologies, and they’re relatively alert to low probability outcomes.
I find it easy to imagine that a couple of years before we get AGI,
we’ll see VCs making multi-billion dollar investments in brand new AI
companies, and large tech companies occasionally acquiring them for tens
of billions.
That doesn’t mean that if we see such a pattern we should believe
there’s more than a 50% chance of AGI within two years—false
alarms
are quite possible. It does mean that many people will get excited about
that possibility, even if it’s only for the purpose of taking money
from VCs.
Probability: 20% chance of a valid alarm, and a 15% chance of a false
alarm.
Corporate Budgets
Companies such as DeepMind and OpenAI might throw around compute budgets
of, say, $100 billion in one year.
Probability: 20% chance of a valid alarm, and a 10% chance of a false
alarm.
Military Arms Race
Military leaders become convinced that human-level AI is 5-10 years
away, given a Manhattan Project-like effort, and that it will give an
important military advantage to any country that achieves it.
This likely involves widespread expert agreement that throwing lots of
compute at the problem will be an important factor in succeeding.
I’d prefer to have no fire alarm than to have this kind of alarm.
Probability: 8%.
Project Apollo Analogy
This scenario resembles the prior one, but leading nations are more
clearly peaceful, and they’re more optimistic that they will be able to
copy (or otherwise benefit from?) the first AI.
A leading nation (let’s say Canada) makes a massive AI project that
hires the best and the brightest, and devotes more money to compute than
could reasonably be expected from any other organization that the best
and brightest would be willing to work for.
This is almost by definition a scenario in which we get an alarm that
something of historic importance is imminent, but in which influential
people mostly downplay the risks.
Probability: 5%.
The Wright Brothers Analogy
H.G. Wells’ 1907 book The War in the
Air predicted that
flying machines would be built. Although he knew a lot about attempts to
build them, Wells didn’t know that the Wright brothers had succeeded in
1903. There were occasional published reports of the Wright brothers
success, but the shortage of credible witnesses left most authorities
doubting their
substance.
Who knows how many other
inventors
had flown by then without getting credit?
I presume a fair amount of that is due to the long litany of failed
approaches, which left most people tired of trying to evaluate claims
that needed unusually favorable conditions to replicate.
AI will have a very different set of evaluation problems, but they might
produce a similar result.
In this scenario, I imagine that any individual AI will take years to
incorporate sufficiently broad and general knowledge to match the best
humans, much as human children do. Thus there will be a significant
period when one or more AI’s are able to exceed human abilities in a
moderate number of fields, and that number of fields is growing at a
moderate and hard to measure pace.
I expect this scenario would produce massive confusion about whether to
be alarmed.
Probability: 15%.
Worms
The smoke surrounding the OpenAI
Codex may be coming from an
important fire.
The novel Daemon
hints at the risks of automated malware.
Stuxnet-like attacks could
become more common if software can productively rewrite itself. Some of
these will mistakenly cause more effects than
intended.
My guess is that the maliciousness will be blamed more on the humans who
create such worms than on any AIs that are involved.
Probability: 10% chance of this causing an AI fire alarm, 60% chance
that major blow-ups will happen without the resulting alarms becoming
focused on AI.
Conspicuously Malicious AI Assistants
Paul Christiano
suggests:
“AI employees will embezzle funds, AI assistants will threaten and
manipulate their users, AI soldiers will desert. … The base rate will
remain low but there will be periodic high-profile blow-ups.”
Subtle versions of this will be common, and not different enough from
current social media manipulation to cause much more alarm than we’ve
already got. I’m not willing to bet on how blatant the scariest cases
will be.
Many cases will be like racist
chatbots: all of the harm can
plausibly be blamed on malicious people. Nevertheless, ‘Satya Nadella,
the CEO of Microsoft, said that Tay “has had a great influence on how
Microsoft is approaching AI,”’.
We’ll have robocars taking passengers on routes that aren’t what the
passengers want. Maybe taking overly long routes that avoid passing
donut shops that might tempt the passengers. Maybe taking them on routes
that pass by homeless camps to raise their consciousness. But I don’t
quite see this kind of example leading up to a major blow-up.
Maybe something like Alexa will buy more addictive junk food than users
want?
I expect we’ll get something like full-fledged AI employees causing a
few spectacular problems, but not far enough in advance of full
human-level AI to qualify as much of a warning.
Financial assistants may buy into ponzi
schemes.
I’m guessing this won’t cause much alarm, due to the difficulty of
distinguishing between incompetence and misalignment.
I expect many borderline cases in this area. I can’t quite see how much
alarm they’ll cause.
Probability: 30% chance that blow-ups like this will cause a fire alarm,
and an additional 30% chance that several such blow-ups will get a good
deal of publicity without generating a clear fire alarm.
Free Guy
An NPC from a video game does something which has large effects on the
business that owns the game. Maybe bullying players, maybe exposing the
company’s trade secrets, maybe showing players how to get similar or
better entertainment at lower cost. Conceivably going on strike for
better NPC working conditions.
This scenario is inspired by the movie Free
Guy, which significantly
exceeded my expectations.
Probability: 8%.
Age of Em
Mind uploading might happen before human level AI. How would that affect
expectations?
Many people will react by saying this proves human intelligence has
unique, inscrutable features which AI researchers can’t replicate. But
I don’t foresee that having much effect on beliefs of AI researchers,
so I’ll treat that as not too important.
It will speed up some approaches to AI development, but will also
provide some evidence that intelligence is messy, in a way that resists
fast improvement. I’ll guess these two factors offset each other,
producing little net change in expectations.
It will refute some beliefs of the form “intelligence requires more
computing power than we can afford”. How much that changes beliefs will
depend a lot on how fast the uploads run.
It will cause a significant alarm about the threat of digital
intelligence outcompeting existing workers. It might cause AI
researchers to worry about their own jobs. That will likely be the main
way in which this scenario increases concern about AI risk. That would
likely focus the alarm on non-existential AI risks.
Probability of uploads causing AI alarm: 8%.
AI Politicians
The world panics when an AI unexpectedly gets elected as president of
the US, on a platform of a welfare system that seems generous enough to
convince most humans to quit their jobs. The S&P 500 drops 50% in 2
weeks, as half of all businesses can’t figure out whether they’ll need
to shut down for lack of workers.
Yet three months later, the new welfare system is only attractive enough
to convince 3% of US workers to quit. Businesses resume their normal
spending, and the US returns to the usual disputes, with 50% of the
country insisting that racism is the root of all evil, and 45% insisting
that the biggest evil is AIs threatening human dignity by manipulating
us into thinking they’re US citizens.
A handful of eccentric people argue about whether the new president is
smarter than humans, and also about whether she is smarter than other
well-known AIs. But they’re largely dismissed, on the grounds that they
believe smartness can be measured by IQ, which is clearly a racist
affront to human dignity.
Probability: 0.1%.
AI’s Manipulate Politics
The previous scenario suffers from an obvious obstacle involved with
altering the relevant eligibility rules for becoming president.
So let’s modify it to have the AI act as an advisor to a politician.
Let’s say that the next Dominic
Cummings is secretly an
AI. We won’t get good information about how much influence AICummings
will have on political decisions, so I expect more confusion than in the
previous scenario.
The people who first express concern about this will likely be branded
as conspiracy theorists, and banned from Twitter and Facebook. Maybe the
AIs will create accounts that sound like crackpots in order to ensure
that we expect these concerns to be mistaken.
I don’t expect that to fully suppress awareness of the problems, as the
politicians who use these AIs will occasionally help their allies to
understand how to use the AIs. The AIs or their owners will also spread
awareness to politicians who might want to purchase these services.
The amount of money spent on resulting arms races between politicians
will likely spread awareness that something interesting is happening,
but I expect enough secrecy, smoke, and mirrors that most people,
including most AI experts, will not see a clear enough pattern to
generate anything that I’d call a fire alarm.
Probability: 5%.
Turing Test Alarm, Snooze Button Edition
GPT-7 passes a Turing test, but isn’t smart enough to do anything
dangerous. People interpret that as evidence that AI is safe.
I don’t mean the half-assed Turing tests that are typically performed.
I’m thinking of a test where the judges have been selected for skill at
judging this type of contest, with large enough rewards for correct
answers that the test will attract competent judges.
Or maybe the test consists of having GPT-7, disguised as a human, apply
for remote jobs at leading tech companies. In this scenario, GPT-7 gets
jobs about as often as humans do, but is rarely able to hold them for
more than a few weeks.
If GPT-7 is strongly hyped before these results are known, then policy
makers, and maybe some of the less sophisticated AI developers, will
take that as proof that humans have unique abilities that software
can’t match.
AI Fire Alarm Scenarios
Link post
I’ve been pondering whether we’ll get any further warnings about when AI(s) will exceed human levels at general-purpose tasks, and that doing so would entail enough risk that AI researchers ought to take some precautions. I feel pretty uncertain about this.
I haven’t even been able to make useful progress at clarifying what I mean by that threshold of general intelligence.
As a weak substitute, I’ve brainstormed a bunch of scenarios describing not-obviously-wrong ways in which people might notice, or fail to notice, that AI is transforming the world.
I’ve given probabilities for each scenario, which I’ve pulled out of my ass and don’t plan to defend.
These probabilities add up to more than 100% - they’re not mutually exclusive. I’ll estimate an additional 15% chance that we get a fire alarm some other way, and a 25% chance that we get neither an alarm nor an interesting story about what prevented the alarm.
What do I mean by “a fire alarm”? Roughly, something that causes 90% of AI researchers to sound more worried than Brian Christian currently sounds.
Please don’t infer too much from this post about the importance of fire alarms. Katja has written an appropriately nuanced evaluation of fire alarms—most people should read just the conclusions. Also, I’m not predicting whether an AI fire alarm would reduce the risks associated with AI.
I’ll start with some relatively mundane scenarios, and build up to stranger ones.
Expert Opinion
New AI researchers and/or research teams come along every couple of years with impressive enough new accomplishments that they get recognized by most AI researchers as one of the top three or four most promising sources of progress toward AI. The world gradually notices that each of these new leaders express at least as much concern about the risks of AI as prior researchers, and that they increasingly agree that AI will pass human levels sometime in the 2030s.
By the late 2020s, there’s a clear enough consensus among the recognized AI experts that no major news organization is willing to deny that AI represents an important source of near-term risk.
Probability: 15%.
Venture Capital
Venture capitalists have above average competence at noticing new technologies, and they’re relatively alert to low probability outcomes.
I find it easy to imagine that a couple of years before we get AGI, we’ll see VCs making multi-billion dollar investments in brand new AI companies, and large tech companies occasionally acquiring them for tens of billions.
That doesn’t mean that if we see such a pattern we should believe there’s more than a 50% chance of AGI within two years—false alarms are quite possible. It does mean that many people will get excited about that possibility, even if it’s only for the purpose of taking money from VCs.
Probability: 20% chance of a valid alarm, and a 15% chance of a false alarm.
Corporate Budgets
Companies such as DeepMind and OpenAI might throw around compute budgets of, say, $100 billion in one year.
Probability: 20% chance of a valid alarm, and a 10% chance of a false alarm.
Military Arms Race
Military leaders become convinced that human-level AI is 5-10 years away, given a Manhattan Project-like effort, and that it will give an important military advantage to any country that achieves it.
This likely involves widespread expert agreement that throwing lots of compute at the problem will be an important factor in succeeding.
I’d prefer to have no fire alarm than to have this kind of alarm.
Probability: 8%.
Project Apollo Analogy
This scenario resembles the prior one, but leading nations are more clearly peaceful, and they’re more optimistic that they will be able to copy (or otherwise benefit from?) the first AI.
A leading nation (let’s say Canada) makes a massive AI project that hires the best and the brightest, and devotes more money to compute than could reasonably be expected from any other organization that the best and brightest would be willing to work for.
This is almost by definition a scenario in which we get an alarm that something of historic importance is imminent, but in which influential people mostly downplay the risks.
Probability: 5%.
The Wright Brothers Analogy
H.G. Wells’ 1907 book The War in the Air predicted that flying machines would be built. Although he knew a lot about attempts to build them, Wells didn’t know that the Wright brothers had succeeded in 1903. There were occasional published reports of the Wright brothers success, but the shortage of credible witnesses left most authorities doubting their substance. Who knows how many other inventors had flown by then without getting credit?
I presume a fair amount of that is due to the long litany of failed approaches, which left most people tired of trying to evaluate claims that needed unusually favorable conditions to replicate.
AI will have a very different set of evaluation problems, but they might produce a similar result.
In this scenario, I imagine that any individual AI will take years to incorporate sufficiently broad and general knowledge to match the best humans, much as human children do. Thus there will be a significant period when one or more AI’s are able to exceed human abilities in a moderate number of fields, and that number of fields is growing at a moderate and hard to measure pace.
I expect this scenario would produce massive confusion about whether to be alarmed.
Probability: 15%.
Worms
The smoke surrounding the OpenAI Codex may be coming from an important fire.
The novel Daemon hints at the risks of automated malware.
Stuxnet-like attacks could become more common if software can productively rewrite itself. Some of these will mistakenly cause more effects than intended.
My guess is that the maliciousness will be blamed more on the humans who create such worms than on any AIs that are involved.
Probability: 10% chance of this causing an AI fire alarm, 60% chance that major blow-ups will happen without the resulting alarms becoming focused on AI.
Conspicuously Malicious AI Assistants
Paul Christiano suggests: “AI employees will embezzle funds, AI assistants will threaten and manipulate their users, AI soldiers will desert. … The base rate will remain low but there will be periodic high-profile blow-ups.”
Subtle versions of this will be common, and not different enough from current social media manipulation to cause much more alarm than we’ve already got. I’m not willing to bet on how blatant the scariest cases will be.
Many cases will be like racist chatbots: all of the harm can plausibly be blamed on malicious people. Nevertheless, ‘Satya Nadella, the CEO of Microsoft, said that Tay “has had a great influence on how Microsoft is approaching AI,”’.
We’ll have robocars taking passengers on routes that aren’t what the passengers want. Maybe taking overly long routes that avoid passing donut shops that might tempt the passengers. Maybe taking them on routes that pass by homeless camps to raise their consciousness. But I don’t quite see this kind of example leading up to a major blow-up.
Maybe something like Alexa will buy more addictive junk food than users want?
I expect we’ll get something like full-fledged AI employees causing a few spectacular problems, but not far enough in advance of full human-level AI to qualify as much of a warning.
Financial assistants may buy into ponzi schemes. I’m guessing this won’t cause much alarm, due to the difficulty of distinguishing between incompetence and misalignment.
I expect many borderline cases in this area. I can’t quite see how much alarm they’ll cause.
Probability: 30% chance that blow-ups like this will cause a fire alarm, and an additional 30% chance that several such blow-ups will get a good deal of publicity without generating a clear fire alarm.
Free Guy
An NPC from a video game does something which has large effects on the business that owns the game. Maybe bullying players, maybe exposing the company’s trade secrets, maybe showing players how to get similar or better entertainment at lower cost. Conceivably going on strike for better NPC working conditions.
This scenario is inspired by the movie Free Guy, which significantly exceeded my expectations.
Probability: 8%.
Age of Em
Mind uploading might happen before human level AI. How would that affect expectations?
Many people will react by saying this proves human intelligence has unique, inscrutable features which AI researchers can’t replicate. But I don’t foresee that having much effect on beliefs of AI researchers, so I’ll treat that as not too important.
It will speed up some approaches to AI development, but will also provide some evidence that intelligence is messy, in a way that resists fast improvement. I’ll guess these two factors offset each other, producing little net change in expectations.
It will refute some beliefs of the form “intelligence requires more computing power than we can afford”. How much that changes beliefs will depend a lot on how fast the uploads run.
It will cause a significant alarm about the threat of digital intelligence outcompeting existing workers. It might cause AI researchers to worry about their own jobs. That will likely be the main way in which this scenario increases concern about AI risk. That would likely focus the alarm on non-existential AI risks.
Probability of uploads causing AI alarm: 8%.
AI Politicians
The world panics when an AI unexpectedly gets elected as president of the US, on a platform of a welfare system that seems generous enough to convince most humans to quit their jobs. The S&P 500 drops 50% in 2 weeks, as half of all businesses can’t figure out whether they’ll need to shut down for lack of workers.
Yet three months later, the new welfare system is only attractive enough to convince 3% of US workers to quit. Businesses resume their normal spending, and the US returns to the usual disputes, with 50% of the country insisting that racism is the root of all evil, and 45% insisting that the biggest evil is AIs threatening human dignity by manipulating us into thinking they’re US citizens.
A handful of eccentric people argue about whether the new president is smarter than humans, and also about whether she is smarter than other well-known AIs. But they’re largely dismissed, on the grounds that they believe smartness can be measured by IQ, which is clearly a racist affront to human dignity.
Probability: 0.1%.
AI’s Manipulate Politics
The previous scenario suffers from an obvious obstacle involved with altering the relevant eligibility rules for becoming president.
So let’s modify it to have the AI act as an advisor to a politician. Let’s say that the next Dominic Cummings is secretly an AI. We won’t get good information about how much influence AICummings will have on political decisions, so I expect more confusion than in the previous scenario.
The people who first express concern about this will likely be branded as conspiracy theorists, and banned from Twitter and Facebook. Maybe the AIs will create accounts that sound like crackpots in order to ensure that we expect these concerns to be mistaken.
I don’t expect that to fully suppress awareness of the problems, as the politicians who use these AIs will occasionally help their allies to understand how to use the AIs. The AIs or their owners will also spread awareness to politicians who might want to purchase these services.
The amount of money spent on resulting arms races between politicians will likely spread awareness that something interesting is happening, but I expect enough secrecy, smoke, and mirrors that most people, including most AI experts, will not see a clear enough pattern to generate anything that I’d call a fire alarm.
Probability: 5%.
Turing Test Alarm, Snooze Button Edition
GPT-7 passes a Turing test, but isn’t smart enough to do anything dangerous. People interpret that as evidence that AI is safe.
I don’t mean the half-assed Turing tests that are typically performed. I’m thinking of a test where the judges have been selected for skill at judging this type of contest, with large enough rewards for correct answers that the test will attract competent judges.
Or maybe the test consists of having GPT-7, disguised as a human, apply for remote jobs at leading tech companies. In this scenario, GPT-7 gets jobs about as often as humans do, but is rarely able to hold them for more than a few weeks.
If GPT-7 is strongly hyped before these results are known, then policy makers, and maybe some of the less sophisticated AI developers, will take that as proof that humans have unique abilities that software can’t match.
Probability: 5%.
See also Bayeswatch.