A list of good heuristics that the case for AI x-risk fails
I think one reason machine learning researchers don’t think AI x-risk is a problem is because they haven’t given it the time of day. And on some level, they may be right in not doing so!
We all need to do meta-level reasoning about what to spend our time and effort on. Even giving an idea or argument the time of day requires it to cross a somewhat high bar, if you value your time. Ultimately, in evaluating whether it’s worth considering a putative issue (like the extinction of humanity at the hands (graspers?) of a rogue AI), one must rely on heuristics; by giving the argument the time of day, you’ve already conceded a significant amount of resources to it! Moreover, you risk privileging the hypothesis or falling victim to Pascal’s Mugging.
Unfortunately, the case for x-risk from out-of-control AI systems seems to fail many powerful and accurate heuristics. This can put proponents of this issue in a similar position to flat-earth conspiracy theorists at first glance. My goal here is to enumerate heuristics that arguments for AI takeover scenarios fail.
Ultimately, I think machine learning researchers should not refuse to consider AI x-risk when presented with a well-made case by a person they respect or have a personal relationship with, but I’m ambivalent as to whether they have an obligation to consider the case if they’ve only seen a few headlines about Elon. I do find it a bit hard to understand how one doesn’t end up thinking about the consequences of super-human AI, since it seems obviously impactful and fascinating. But I’m a very curious (read “distractable”) person…
A list of heuristics that say not to worry about AI takeover scenarios:
Outsiders not experts: This concern is being voiced exclusively by non-experts like Elon Musk, Steven Hawking, and the talkative crazy guy next to you on the bus.
Ludditism has a poor track record: For every new technology, there’s been a pack of alarmist naysayers and doomsday prophets. And then instead of falling apart, the world got better.
EtA: No concrete threat model: When someone raises a hypothetical concern, but can’t give you a good explanation for how it could actually happen, it’s much less likely to actually happen. Is the paperclip maximizer the best you can do?
It’s straight out of science fiction: AI researchers didn’t come up with this concern, Hollywood did. Science fiction is constructed based on entertaining premises, not realistic capabilities of technologies.
It’s not empirically testable: There’s no way to falsify the belief that AI will kill us all. It’s purely a matter of faith. Such beliefs don’t have good track records of matching reality.
It’s just too extreme: Whenever we hear an extreme prediction, we should be suspicious. To the extent that extreme changes happen, they tend to be unpredictable. While extreme predictions sometimes contain a seed of truth, reality tends to be more mundane and boring.
It has no grounding in my personal experience: When I train my AI systems, they are dumb as doorknobs. You’re telling me they’re going to be smarter than me? In a few years? So smart that they can outwit me, even though I control the very substrate of their existence?
It’s too far off: It’s too hard to predict the future and we can’t really hope to anticipate specific problems with future AI systems; we’re sure to be surprised! We should wait until we can envision more specific issues, scenarios, and threats, not waste our time on what comes down to pure speculation.
I’m pretty sure this list in incomplete, and I plan to keep adding to it as I think of or hear new suggestions! Suggest away!!
Also, to be clear, I am writing these descriptions from the perspective of someone who has had very limited exposure to the ideas underlying concerns about AI takeover scenarios. I think a lot of these reactions indicate significant misunderstandings about what people working on mitigating AI x-risk believe, as well as matters of fact (e.g. a number of experts have voiced concerns about AI x-risk, and a significant portion of the research community seems to agree that these concerns are at least somewhat plausible and important).
- AI Alignment 2018-19 Review by 28 Jan 2020 2:19 UTC; 126 points) (
- AGI risk: analogies & arguments by 23 Mar 2021 13:18 UTC; 31 points) (EA Forum;
- What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. by 2 Dec 2019 18:20 UTC; 29 points) (
- 12 Aug 2022 7:34 UTC; 18 points) 's comment on Why does no one care about AI? by (EA Forum;
- 28 Mar 2020 7:42 UTC; 16 points) 's comment on MichaelA’s Shortform by (
- [AN #79]: Recursive reward modeling as an alignment technique integrated with deep RL by 1 Jan 2020 18:00 UTC; 13 points) (
- 16 Jul 2020 10:53 UTC; 11 points) 's comment on A list of good heuristics that the case for AI X-risk fails by (EA Forum;
- 24 Sep 2023 23:59 UTC; 3 points) 's comment on I designed an AI safety course (for a philosophy department) by (
Here’s another: AI being x-risky makes me the bad guy.
That is, if I’m an AI researcher and someone tells me that AI poses x-risks, I might react by seeing this as someone telling me I’m a bad person for working on something that makes the world worse. This is bad for me because I derive import parts of my sense of self from being an AI researcher: it’s my profession, my source of income, my primary source of status, and a huge part of what makes my life meaningful to me. If what I am doing is bad or dangerous, that threatens to take much of that away (if I also want to think of myself as a good person, meaning I either have to stop doing AI work to avoid being bad or stop thinking of myself as good), and an easy solution to that is to dismiss the arguments.
This is more generally a kind of motivated cognition or rationalization, but I think it’s worth considering a specific mechanism because it better points towards ways you might address the objection.
This doesn’t seem like it belongs on a “list of good heuristics”, though!
Another important improvement I should make: rephrase these to have the type signature of “heuristic”!
I pushed this post out since I think it’s good to link to it in this other post. But there are at least 2 improvements I’d like to make and would appreciate help with:
Is there a better reference for ” a number of experts have voiced concerns about AI x-risk ”? I feel like there should be by now...
I just realized it would be nice to include examples where these heuristics lead to good judgments.
I helped make this list in 2016 for a post by Nate, partly because I was dissatisfied with Scott’s list (which includes people like Richard Sutton, who thinks worrying about AI risk is carbon chauvinism):
These days I’d probably make a different list, including people like Yoshua Bengio. AI risk stuff is also sufficiently in the Overton window that I care more about researchers’ specific views than about “does the alignment problem seem nontrivial to you?”. Even if we’re just asking the latter question, I think it’s more useful to list the specific views and arguments of individuals (e.g., note that Rossi is more optimistic about the alignment problem than Russell), list the views and arguments of the similarly prominent CS people who think worrying about AGI is silly, and let people eyeball which people they think tend to produce better reasons.
I hope someone actually answers your question, but FWIW, the Asilomar principles were signed by an impressive list of prominent AI experts. Five of the items are related to AGI and x-risk. The statements aren’t really strong enough to declare that those people “voiced concerns about AI x-risk”, but it’s a data-point for what can be said about AI x-risk while staying firmly in the mainstream.
My experience in casual discussions is that it’s enough to just name one example to make the point, and that example is of course Stuart Russell. When talking to non-ML people—who don’t know the currently-famous AI people anyway—I may also mention older examples like Alan Turing, Marvin Minsky, or Norbert Wiener.
Thanks for this nice post. :-)
Yeah I’ve had conversations with people who shot down a long list of concerned experts, e.g.:
Stuart Russell is GOFAI ==> out-of-touch
Shane Legg doesn’t do DL, does he even do research? ==> out-of-touch
Ilya Sutskever (and everyone at OpenAI) is crazy, they think AGI is 5 years away ==> out-of-touch
Anyone at DeepMind is just marketing their B.S. “AGI” story or drank the koolaid ==> out-of-touch
But then, even the big 5 of deep learning have all said things that can be used to support the case....
So it kind of seems like there should be a compendium of quotes somewhere, or something.
Sounds like their problem isn’t just misleading heuristics, it’s motivated cognition.
Oh sure, in some special cases. I don’t this this experience was particularly representative.
Sort of related to a couple points you already brought up (not in personal experience, outsiders not experts, science fiction), but worrying about AI x-risk is also weird, i.e. it’s not a thing everyone else is worrying about, so you use some of your weirdness-points to publicly worry about it, and most people have very low weirdness budgets (because of not enough status to afford more weirdness, low psychological openness, etc.).
Flo’s summary for the Alignment Newsletter:
Flo’s opinion:
My opinion:
There is an issue of definition here. Categories of scenario exist where it is unclear if they constitute an “AI takeover” even though there is recognition of a real and likely risk of some type. Almost everyone stakes out positions at binary extremes of outcome, good or bad, without much consideration for plausible quasi-equilibrium states in the middle that fall out of some risk models. For researchers working in the latter camp, it will feel a bit like a false dichotomy.
As another heuristic, the inability to arrive at a common set of elementary computational assumptions, grounded in physics, whence the AI risk models are derived is sufficient reason to be skeptical of any particular AI risk model without knowing much else.
Psychologically, worry is negative prediction about future. They believe a healthy person would not worry about things that have extremely low probability, can’t be predicted, can be predicted but there’s nothing you can do about it. I’d say AI x-risk can be easily perceived to be at least one of them.
I think this list is interesting and potentially useful, and I think I’m glad you put it together. I also generally think it’s a good and useful norm for people to seriously engage with the arguments they (at least sort-of/overall) disagree with.
But I’m also a bit concerned about how this is currently presented. In particular:
This is titled “A list of good heuristics that the case for AI x-risk fails”.
The heuristics themselves are stated as facts, not as something like “People may believe that...” or “Some claim that...” (using words like “might” could also help).
A comment of yours suggests you’ve already noticed this. But I think it’d be pretty quick to fix.
Your final paragraph, a very useful caveat, comes after listing all the heuristics as facts.
I think these things will have relatively small downsides, given the likely quite informed and attentive audience here. But a bunch of psychological research I read a while ago (2015-2017) suggests there could be some degree of downsides. E.g.:
And also:
Based on that sort of research (for a tad more info on it, see here), I’d suggest:
Renaming this to something like “A list of heuristics that suggest the case for AI x-risk is weak” (or even “fails”, if you’ve said something like “suggest” or “might”)
Rephrasing the heuristics to stated as disputable (or even false) claims, rather than facts. E.g., “Some people may believe that this concern is being voiced exclusively by non-experts like Elon Musk, Steven Hawking, and the talkative crazy guy next to you on the bus.” ETA: Putting them in quote marks might be another option for that.
Moving what’s currently the final paragraph caveat to before the list of heuristics.
Perhaps also adding sub-points about the particularly disputable dot points. E.g.:
“(But note that several AI experts have now voiced concern about the possibility of major catastrophes from advanced AI system, although there’s still not consensus on this.)”
I also recognise that several of the heuristics really do seem good, and probably should make us at least somewhat less concerned about AI. So I’m not suggesting trying to make the heuristics all sound deeply flawed. I’m just suggestng perhaps being more careful not to end up with some readers’ brains, on some level, automatically processing all of these heuristics as definite truths that definitely suggest AI x-risk isn’t worth of attention.
Sorry for the very unsolicited advice! It’s just that preventing gradual slides into false beliefs (including from well-intentioned efforts that do actually contain the truth in them!) is sort of a hobby-horse of mine.
Also, one other heuristic/proposition that, as far as I’m aware, is simply factually incorrect (rather than “flawed but in debatable ways” or “actually pretty sound”) is “AI researchers didn’t come up with this concern, Hollywood did. Science fiction is constructed based on entertaining premises, not realistic capabilities of technologies.” So there it may also be worth pointing out in some manner that, in reality, quite early on prominent AI researchers raised concerns somewhat similar to those discussed now.
E.g., I. J. Good apparently wrote in 1959: