Q&A with Richard Carrier on risks from AI
[Click here to see a list of all interviews]
I am emailing experts in order to raise and estimate the academic awareness and perception of risks from AI.
Richard Carrier is a world-renowned author and speaker. As a professional historian, published philosopher, and prominent defender of the American freethought movement, Dr. Carrier has appeared across the country and on national television defending sound historical methods and the ethical worldview of secular naturalism. His books and articles have also received international attention. He holds a Ph.D. from Columbia University in ancient history, specializing in the intellectual history of Greece and Rome, particularly ancient philosophy, religion, and science, with emphasis on the origins of Christianity and the use and progress of science under the Roman empire. He is best known as the author of Sense and Goodness without God, Not the Impossible Faith, and Why I Am Not a Christian, and a major contributor to The Empty Tomb, The Christian Delusion, The End of Christianity, and Sources of the Jesus Tradition, as well as writer and editor-in-chief (now emeritus) for the Secular Web, and for his copious work in history and philosophy online and in print. He is currently working on his next books, Proving History: Bayes’s Theorem and the Quest for the Historical Jesus, On the Historicity of Jesus Christ, The Scientist in the Early Roman Empire, and Science Education in the Early Roman Empire. To learn more about Dr. Carrier and his work follow the links below.
Homepage: richardcarrier.info
Blog: freethoughtblogs.com/carrier/ (old blog: richardcarrier.blogspot.com)
Selected articles:
Bayes’ Theorem: Lust for Glory! (Blog post and video talk)
“Bayes’ Theorem for Beginners: Formal Logic and Its Relevance to Historical Method” (Paper)
The Interview:
Richard Carrier: Note that I follow and support the work of The Singularity Institute on precisely this issue, which you are writing for (if you are a correspondent for Less Wrong). And I believe all AI developers should (e.g. CALO). So my answers won’t be too surprising (below). But also keep in mind what I say (not just on “singularity” claims) at:
http://richardcarrier.blogspot.com/2009/06/are-we-doomed.html
Q1: Assuming no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of roughly human-level machine intelligence?
Richard Carrier: 2020/2040/2080
Explanatory remark to Q1:
P(human-level AI by (year) | no wars ∧ no disasters ∧ beneficially political and economic development) = 10%/50%/90%
Q2: What probability do you assign to the possibility of human extinction as a result of badly done AI?
Richard Carrier: Here the relative probability is much higher that human extinction will result from benevolent AI, i.e. eventually Homo sapiens will be self-evidently obsolete and we will voluntarily transition to Homo cyberneticus. In other words, we will extinguish the Homo sapiens species ourselves, voluntarily. If you asked for a 10%/50%/90% deadline for this I would say 2500/3000/4000.
However, perhaps you mean to ask regarding the extinction of all Homo, and their replacement with AI that did not originate as a human mind, i.e. the probability that some AI will kill us and just propagate itself.
The answer to that is dependent on what you mean by “badly done” AI: (a) AI that has more power than we think we gave it, causing us problems, or (b) AI that has so much more power than we think we gave it that it can prevent our taking its power away.
(a) is probably inevitable, or at any rate a high probability, and there will likely be deaths or other catastrophes, but like other tech failures (e.g. the Titanic, three mile island, hijacking jumbo jets and using them as guided missiles) we will prevail, and very quickly from a historical perspective (e.g. there won’t be another 9/11 using airplanes as missiles; we only got jacked by that unforeseen failure once). We would do well to prevent as many problems as possible by being as smart as we can be about implementing AI, and not underestimating its ability to outsmart us, or to develop while we aren’t looking (e.g. Siri could go sentient on its own, if no one is managing it closely to ensure that doesn’t happen).
(b) is very improbable because AI function is too dependent on human cooperation (e.g. power grid; physical servers that can be axed or bombed; an internet that can be shut down manually) and any move by AI to supplant that requirement would be too obvious and thus too easily stopped. In short, AI is infrastructure dependent, but it takes too much time and effort to build an infrastructure, and even more an infrastructure that is invulnerable to demolition. By the time AI has an independent infrastructure (e.g. its own robot population worldwide, its own power supplies, manufacturing plants, etc.) Homo sapiens will probably already be transitioning to Homo cyberneticus and there will be no effective difference between us and AI.
However, given no deadline, it’s likely there will be scenarios like: “god” AI’s run sims in which digitized humans live, and any given god AI could decide to delete the sim and stop running it (and likewise all comparable AI shepherding scenarios). So then we’d be asking how likely is it that a god AI would ever do that, and more specifically, that all would (since there won’t be just one sim run by one AI, but many, so one going rogue would not mean extinction of humanity).
So setting aside AI that merely kills some people, and only focusing on total extinction of Homo sapiens, we have:
P(voluntary human extinction by replacement | any AGI at all) = 90%+
P(involuntary human extinction without replacement | badly done AGI type (a)) = < 10^-20
[and that’s taking into account an infinite deadline, because the probability steeply declines with every year after first opportunity, e.g. AI that doesn’t do it the first chance it gets is rapidly less likely to as time goes on, so the total probability has a limit even at infinite time, and I would put that limit somewhere as here assigned.]
P(involuntary human extinction without replacement | badly done AGI type (b)) = .33 to .67
However, P(badly done AGI type (b)) = < 10^-20
Explanatory remark to Q2:
P(human extinction | badly done AI) = ?
(Where ‘badly done’ = AGI capable of self-modification that is not provably non-dangerous.)
Q3: What probability do you assign to the possibility of a human level AGI to self-modify its way up to massive superhuman intelligence within a matter of hours/days/< 5 years?
Richard Carrier: Depends on when it starts. For example, if we started a human-level AGI tomorrow, it’s ability to revise itself would be hugely limited by our slow and expensive infrastructure (e.g. manufacturing the new circuits, building the mainframe extensions, supplying them with power, debugging the system). In that context, “hours” and “days” have P --> 0, but 5 years has P = 33%+ if someone is funding the project, and likewise 10 years has P=67%+; and 25 years, P=90%+. However, suppose human-level AGI is first realized in fifty years when all these things can be done in a single room with relatively inexpensive automation and the power demands of any new system were not greater than are normally supplied to that room. Then P(days) = 90%+. And with massively more advanced tech, say such as we might have in 2500, then P(hours) = 90%+.
However...
Explanatory remark to Q3:
P(superhuman intelligence within hours | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = ?
P(superhuman intelligence within days | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = ?
P(superhuman intelligence within < 5 years | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = ?
Richard Carrier: Perhaps you are confusing intelligence with knowledge. Internet connection can make no difference to the former (since an AGI will have no more control over the internet than human operators do). That can only expand a mind’s knowledge. As to how quickly, it will depend more on the rate of processed seconds in the AGI itself, i.e. if it can simulate human thought only at the same pace as non-AI, then it will not be able to learn any faster than a regular person, no matter what kind of internet connection it has. But if the AGI can process ten seconds time in one second of non-AI time, then it can learn ten times as fast, up to the limit of data access (and that is where internet connection speed will matter). That is a calculation I can’t do. A computer science expert would have to be consulted to calculate reasonable estimates of what connection speed would be needed to learn at ten times normal human pace, assuming the learner can learn that fast (which a ten:one time processor could); likewise a hundred times, etc. And all that would tell you is how quickly that mind can learn. But learning in and of itself doesn’t make you smarter. That would require software or circuit redesign, which would require testing and debugging. Otherwise once you had all relevant knowledge available to any human software/circuit design team, you would simply be no smarter than them, and further learning would not help you (thus humans already have that knowledge level: that’s why we work in teams to begin with), thus AI is not likely to much exceed us in that ability. The only edge it can exploit is speed of a serial design thought process, but even that runs up against the time and resource expense of testing and debugging anything it designed, and that is where physical infrastructure slows the rate of development, and massive continuing human funding is needed. Hence my probabilities above.
Q4: Is it important to figure out how to make AI provably friendly to us and our values (non-dangerous), before attempting to solve artificial general intelligence?
Richard Carrier: Yes. At the very least it is important to take the risks very seriously, and incorporate it as a concern within every project flow. I believe there should always be someone expert in the matter assigned to any AGI design team, who is monitoring everything being done and assessing its risks and ensuring safeguards are in place before implementation at each step. It already concerns me that this might not be a component of the management of Siri, and Siri achieving AGI is a low probability (but not vanishingly low; I’d say it could be as high as 1% in 10 years unless Siri’s processing space is being deliberately limited so it cannot achieve a certain level of complexity, or in other ways its cognitive abilities being actively limited).
Explanatory remark to Q4:
How much money is currently required to mitigate possible risks from AI (to be instrumental in maximizing your personal long-term goals, e.g. surviving this century), less/no more/little more/much more/vastly more?
Richard Carrier: Required is not very much. A single expert monitoring Siri who has real power to implement safeguards would be sufficient, so with salary and benefits and collateral overhead, that’s no more than $250,000/year, for a company that has billions in liquid capital. (Because safeguards are not expensive, e.g. capping Siri’s processing space costs nothing in practical terms; likewise writing her software to limit what she can actually do no matter how sentient she became, e.g. imagine an army of human hackers hacked Siri at the source and could run Siri by a million direct terminals, what could they do? Answering that question will evoke obvious safeguards to put on Siri’s physical access and software; the most obvious is making it impossible for Siri to rewrite her own core software.)
But what actually is being spent I don’t know. I suspect “a little more” needs to be spent than is, only because I get the impression AI developers aren’t taking this seriously, and yet the cost of monitoring is not that high.
And yet you may notice all this is separate from the question of making AGI “provably friendly” which is what you asked about (and even that is not the same as “provably safe” since friendly AGI poses risks as well, as the Singularity Institute has been pointing out).
This is because all we need do now is limit AGI’s power at its nascence. Then we can explore how to make AGI friendly, and then provably friendly, and then provably safe. In fact I expect AGI will even help us with that. Once AGI exists, the need to invest heavily in making it safe will be universally obvious. Whereas before AGI exists there is little we can do to ascertain how to make it safe, since we don’t have a working model to test. Think of trying to make a ship safe, without ever getting to build and test any vessel, nor having knowledge of any other vessels, and without knowing anything about the laws of buoyancy. There wouldn’t be a lot you could do.
Nevertheless it would be worth some investment to explore how much we can now know, particularly as it can be cross-purposed with understanding human moral decision making better, and thus need not be sold as “just AI morality” research. How much more should we spend on this now? Much more than we are. But only because I see that money benefiting us directly, in understanding how to make ordinary people better, and detect bad people, and so on, which is of great value wholly apart from its application to AGI. Having it double as research on how to design moral thought processes unrestrained by human brain structure would then benefit any future AGI development.
Q5: Do possible risks from AI outweigh other possible existential risks, e.g. risks associated with the possibility of advanced nanotechnology?
Explanatory remark to Q5:
What existential risk (human extinction type event) is currently most likely to have the greatest negative impact on your personal long-term goals, under the condition that nothing is done to mitigate the risk?
Richard Carrier: All existential risks are of such vastly low probability it would be beyond human comprehension to rank them, and utterly pointless to anyway. And even if I were to rank them, extinction by comet, asteroid or cosmological gamma ray burst vastly outranks any manmade cause. Even extinction by supervolcano vastly outranks any manmade cause. So I don’t concern myself with this (except to call for more investment in earth impactor detection, and the monitoring of supervolcano risks).
We should be concerned not with existential risks, but ordinary risks, e.g. small scale nuclear or biological terrorism, which won’t kill the human race, and might not even take civilization into the Dark Ages, but can cause thousands or millions to die and have other bad repercussions. Because ordinary risks are billions upon billions of times more likely than extinction events, and as it happens, mitigating ordinary risks entails mitigating existential risks anyway (e.g. limiting the ability to go nuclear prevents small scale nuclear attacks just as well as nuclear annihilation events, in fact it makes the latter billions of times less likely than it already is).
Thus when it comes to AI, as an existential risk it just isn’t one (P --> 0), but as a panoply of ordinary risks, it is (P --> 1). And it doesn’t matter how it ranks, it should get full attention anyway, like all definite risks do. It thus doesn’t need to be ranked against other risks, as if terrorism were such a great risk we should invest nothing in earthquake safety, or vice versa.
Q6: What is the current level of awareness of possible risks from AI, relative to the ideal level?
Richard Carrier: Very low. Even among AI developers it seems.
Q7: Can you think of any milestone such that if it were ever reached you would expect human‐level machine intelligence to be developed within five years thereafter?
Richard Carrier: There will not be “a” milestone like that, unless it is something wholly unexpected (like a massive breakthrough in circuit design that allows virtually infinite processing power on a desktop: which development would make P(AGI within five years) > 33%). But wholly unexpected discoveries have a very low probability. Sticking only with what we already expect to occur, the five-year milestone for AGI will be AHI, artificial higher intelligence, e.g. a robot cat that behaved exactly like a real cat. Or a Watson who can actively learn on its own without being programmed with data (but still can only answer questions, and not plan or reason out problems). The CALO project is likely to develop an increasingly sophisticated Siri-like AI that won’t be AGI but will gradually become more and more like AGI, so that there won’t be any point where someone can say “it will achieve AGI within 5 years.” Rather it will achieve AGI gradually and unexpectedly, and people will even debate when or whether it had.
Basically, I’d say once we have “well-trained dog” level AI, the probability of human-level AI becomes:
P(< 5 years) = 10%
P(< 10 years) = 25%
P(< 20 years) = 50%
P(< 40 years) = 90%
I… really don’t see how this can be justified.
It is extraordinary. Even if you set aside the kinds of existential risks we tend to discuss here, several experts on climate change (eg James Hansen) consider runaway global warming sufficient to destroy all life on Earth a real risk.
Climate gurus often get funding by being alarmist, though. Thay want to get paid so they can save the world—but first the world must be at risk. Thus all the climate scaremongering. It’s a real problem.
If they don’t really believe that the world is at risk, why aren’t they getting paid more doing something else?
I didn’t mean to suggest that they don’t believe in what they are saying.
As far as I understand the phenomenon, DOOM-sayers are normally perfectly sincere.
Check out the The End of The World Cult for example. The prophets of DOOM are not kidding.
Hmm. Information theoretic extinction seems pretty unlikely to me. Humanity will live on in the history “books” about major transitions—and the “books” at that stage will no doubt be pretty fancy—with multiple “instantiated” humans.
I don’t think that’s very likely either, but 10^-20 seems to be an overconfident probability for it.
A rather bizarre view, IMHO. I think that only a few would agree with this.
Some of those probabilities are wildly overconfident. <1 in 10^-20 for badly done superintelligence and badly done somewhat-less-superintelligence wiping out humanity? Ordinary risks are “billions upon billions” of times more likely than existential risks? Maybe that one could work if every tornado that killed ten people was counted under “ordinary risks,” but it’s still overconfident. If he thinks things on the scale of “small nuclear war or bioterrorism” are billions of times more likely than existential risks, he’s way overconfident.
That was:
“AGI type (a)” was previously defined to be:
So, what we may be seeing here is fancy footwork based on definitions.
If “a” = “humans win” then (humans lose | a) may indeed be very small.
This suggests that he see the limiting factor for AI to be hardware, however I’ve heard people argue that we probably already have the hardware needed for human-level AI if we get the software right (and I’m pretty sure that was before things like cloud computing were so easily available)
I wonder how likely he thinks it is that a single organisation today (maybe Google?) already has the hardware required to run a human-level AI and the same speed as the human brain. Assuming we magically solved all the software problems.
We do, but it’s not cost effective or fast enough—so humans are cheaper and (sometimes) better. Within a decade, the estimated hardware may cost around 100 USD and the performance difference won’t be there. Sometime around then, things seem likely to get more interesting.
This POV, while radically different from EY’s, makes a lot more sense to me, and probably to most ordinary people. I wonder if there is a believable mathematical calculation either way.
Can you describe in what way this makes sense to you? Do you mean that it doesn’t instantly clash with your intuition or do you mean that you have some reason to believe it?
You can actually quantify many external existential risks, give probabilities and error bars. No believable estimate has been done for man-made existential risks, because we have never (yet) destroyed ourselves on a grand enough scale.
Am I reading you right? You seem to be arguing using the form:
But since when does an event have to occur in order for us to get a reasonable probability estimate?
What if we look at one salient example? How about assessments of the probability of a global nuclear war? Any decent assessment would provide a reasonable lower bound for a man-made human extinction event. In addition to the more recent article I linked to, don’t you suppose that RAND or another group ever devised a believable estimate of the likelihood of extinction via global nuclear war some time between the 1950′s and the 1989?
It seems hard to believe that nuclear war alone wouldn’t have provided a perpetual lower-bound of greater than 10% on a man-made extinction scenario during most of the cold war. Even now, this lower-bound though smaller doesn’t seem to be totally negligible (hence the ongoing risk assessments and advocacy for disarmament).
Even if it were the case that natural risks greatly outweighed the risk of man-made extinction events,
Doesn’t follow (I’m assuming this part of the POV made sense to you as well) given my counter example. Of course you might have a good reason to reject my counter-example, and if so I’d be interested in seeing it.
No, it wouldn’t. One needs a probability of extinction conditional on global nuclear war (generally considered quite unlikely). Perhaps this might happen if it turns out that the Industrial Revolution is a fluke that could not be repeated without fossil fuels, or if nuclear winter was extraordinarily severe (the authors of the recent nuclear winter papers think it very unlikely that a global nuclear war using current arsenals could cause extinction), or if nuclear-driven collapse prevented us from deflecting an extinction-level asteroids, but there’s a further step in the argument. I think reasonable assignments of probabilities will still give you more nuclear existential risk in the next century than risk from natural disasters, but the analysis will depend on right-tail outcomes and model uncertainty.
Is there anything, in particular, you do consider a reasonably tight lower bound for a man-made extinction event? If so, would you be willing to explain your reasoning?
Mega-scale asteroid impacts (dinosaur-killer size) come close. Uncertainty there would be about whether we could survive climatic disruptions better than the dinosaurs did (noting that fish, crocodiles, mammals, birds, etc, survived) using technology.
This doesn’t really answer the “man-made” part of “man-made extinction event” (unless you know something about mad scientists with ion engines mounted on large asteroids that I don’t know).
Sorry, I misread your question. I don’t think we have rigid uncontroversial frequentist estimates for any man-made extinction event. There are estimates I would say are unreasonably low, but there will be a step along the lines of “really?!? You seriously assign less than a 1 in 1 billion probability to there being a way for bioweapons programs of the next 50 years to create a set of overlapping long-latency high virulence pathogens that would get all of humanity, in light of mousepox and H5N1 experiments, the capabilities of synthetic biology, the results of these expert elicitations, etc?”
Hellman says:
It seems to be challenging to figure out what the the chances of even this much milder event are.
Of course, ignorance should not lead to complacency.
What? Seriously? Most ordinary people think that human caused extinction events are vastly less likely than natural ones? But.… that’s crazy! I must really be out of touch.
I suspect it’s one of those cases where the answer you get from most people depends on how you ask the question… that is, on what kinds of priming/anchoring effects are in play.