Against most, but not all, AI risk analogies
I personally dislike most AI risk analogies that I’ve seen people use. While I think analogies can be helpful for explaining concepts to people and illustrating mental pictures, I think they are frequently misused, and often harmful. At the root of the problem is that analogies are consistently mistaken for, and often deliberately intended as arguments for particular AI risk positions. And a large fraction of the time[1] when analogies are used this way, I think they are misleading and imprecise, routinely conveying the false impression of a specific, credible model of AI, even when no such credible model exists.
Here is a random list of examples of analogies that I found in the context of AI risk (note that I’m not saying these are bad in every context):
Stuart Russell: “It’s not exactly like inviting a superior alien species to come and be our slaves forever, but it’s sort of like that.”
Rob Wiblin: “It’s a little bit like trying to understand how octopuses are going to think or how they’ll behave — except that octopuses don’t exist yet, and all we get to do is study their ancestors, the sea snail, and then we have to figure out from that what’s it like to be an octopus.”
Eliezer Yudkowsky: “The character this AI plays is not the AI. The AI is an unseen actress who, for now, is playing this character. This potentially backfires if the AI gets smarter.”
Nate Soares: “My guess for how AI progress goes is that at some point, some team gets an AI that starts generalizing sufficiently well, sufficiently far outside of its training distribution, that it can gain mastery of fields like physics, bioengineering, and psychology [...] And in the same stroke that its capabilities leap forward, its alignment properties are revealed to be shallow, and to fail to generalize. The central analogy here is that optimizing apes for inclusive genetic fitness (IGF) doesn’t make the resulting humans optimize mentally for IGF.”
Norbert Wiener: “when a machine constructed by us is capable of operating on its incoming data at a pace which we cannot keep, we may not know, until too late, when to turn it off. We all know the fable of the sorcerer’s apprentice...”
Geoffry Hinton: “It’s like nuclear weapons. If there’s a nuclear war, we all lose. And it’s the same with these things taking over.”
Joe Carlsmith: “I think a better analogy for AI is something like an engineered virus, where, if it gets out, it gets harder and harder to contain, and it’s a bigger and bigger problem.”
Ajeya Cotra: “Corporations might be a better analogy in some sense than the economy as a whole: they’re made of these human parts, but end up pretty often pursuing things that aren’t actually something like an uncomplicated average of the goals and desires of the humans that make up this machine, which is the Coca-Cola Corporation or something.”
Ezra Klein: “As my colleague Ross Douthat wrote, this is an act of summoning. The coders casting these spells have no idea what will stumble through the portal.”
SKLUUG: “AI risk is like Terminator! AI might get real smart, and decide to kill us all! We need to do something about it!”
These analogies cover a wide scope, and many of them can indeed sometimes be useful in conveying meaningful information. My point is not that they are never useful, but rather that these analogies can be shallow and misleading. The analogies establish almost nothing of importance about the behavior and workings of real AIs, but nonetheless give the impression of a model for how we should think about AIs.
And notice how these analogies can give an impression of a coherent AI model even when the speaker is not directly asserting it to be a model. Regardless of the speaker’s intentions, I think the actual effect is frequently to plant a detailed-yet-false picture in the audience’s mind, giving rise to specious ideas about how real AIs will operate in the future. Because the similarities are so shallow, reasoning from these analogies will tend to be unreliable.
A central issue here is that these analogies are frequently chosen selectively — picked on the basis of evoking a particular favored image, rather on the basis of identifying the most natural point of comparison possible. Consider this example from Ajeya Cotra,
Rob Wiblin: I wanted to talk for a minute about different analogies and different mental pictures that people use in order to reason about all of these issues. [...] Are there any other mental models or analogies that you think are worth highlighting?
Ajeya Cotra: Another analogy that actually a podcast that I listen to made — it’s an art podcast, so did an episode on AI as AI art started to really take off — was that it’s like you’re raising a lion cub, or you have these people who raise baby chimpanzees, and you’re trying to steer it in the right directions. And maybe it’s very cute and charming, but fundamentally it’s alien from you. It doesn’t necessarily matter how well you’ve tried to raise it or guide it — it could just tear off your face when it’s an adult.
Is there any reason why Cotra chose “chimpanzee” as the point of comparison when “golden retriever” would have been equally valid? It’s hard to know, but plausibly, she didn’t choose golden retriever because that would have undermined her general thesis.
I agree that if her goal was to convey the logical possibility of misalignment, then the analogy to chimpanzees works well. But if her goal was to convey the plausibility of misalignment, or anything like a “mental model” of how we should think of AI, I see no strong reason to prefer the chimpanzee analogy over the golden retriever analogy. The mere fact that one analogy evokes a negative image and the other evokes a positive image seems, by itself, no basis for any preference in their usage.
Or consider the analogy to human evolution. If you are trying to convey the logical possibility of inner misalignment, the analogy to human evolution makes sense. But if you are attempting to convey the plausibility of inner misalignment, or a mental model of inner misalignment, why not choose instead to analogize the situation to within-lifetime learning among humans? Indeed, as Quintin Pope has explained, the evolution analogy seems to have some big flaws:
“human behavior in the ancestral environment” versus “human behavior in the modern environment” isn’t a valid example of behavioral differences between training and deployment environments. Humans weren’t “trained” in the ancestral environment, then “deployed” in the modern environment. Instead, humans are continuously “trained” throughout our lifetimes (via reward signals and sensory predictive error signals). Humans in the ancestral and modern environments are different “training runs”.
As a result, human evolution is not an example of:
We trained the system in environment A. Then, the trained system processed a different distribution of inputs from environment B, and now the system behaves differently.
It’s an example of:
We trained a system in environment A. Then, we trained a fresh version of the same system on a different distribution of inputs from environment B, and now the two different systems behave differently.
Many proponents of AI risk already seem quite happy to critique plenty of AI analogies, such as the anthropomorphic analogy, or “it’s like a toaster” and “it’s like Google Maps”. And of course, in these cases, we can easily identify the flaws:
Ajeya Cotra: I think the real disanalogy between Google Maps and all of this stuff and AI systems is that we are not producing these AI systems in the same way that we produced Google Maps: by some human sitting down, thinking about what it should look like, and then writing code that determines what it should look like.
To be clear, I agree Google Maps is a bad analogy, and it should rightly be criticized. But is the chimp analogy really so much better? Shouldn’t we be applying the same degree of rigor against our own analogies too?
My point is not simply “use a different analogy”. My point is that we should largely stop relying on analogies in the first place. Use detailed object-level arguments instead!
ETA: To clarify, I’m not against using analogies in every case. I’m mostly just wary of having our arguments depend on analogies, rather than detailed models. See this footnote for more information about how I view the proper use of analogies.[1]
While the purpose of analogies is to provide knowledge in place of ignorance — to explain an insight or a concept — I believe many AI risk analogies primarily misinform or confuse people rather than enlighten them; they can insert unnecessary false assumptions in place of real understanding. The basic concept they are intended to convey may be valuable to understand, but riding along with that concept is a giant heap of additional speculation.
Part of this is that I don’t share other people’s picture about what AIs will actually look like in the future. This is only a small part of my argument, because my main point is that that we should rely much less on arguments by analogy, rather than switch to different analogies that convey different pictures. But this difference in how I view the future still plays a significant role in my frustration at the usage of AI risk analogies.
Maybe you think, for example, that the alien and animal analogies are great for reasons that I’m totally missing. But it’s still hard for me to see that. At least, let me compare my picture, and maybe you can see where I’m coming from.
Again: The next section is not an argument. It is a deliberately evocative picture, to help compare my expectations of the future against the analogies I cited above. My main point in this post is that we should move away from a dependence on analogies, but if you need a “picture” of what I expect from AI, to compare it to your own, here is mine.
The default picture, as I see it — the thing that seems to me like a straightforward extrapolation of current trends in 2024 into the medium-term future, as AIs match and begin to slightly exceed human intelligence — looks nothing like the caricatures evoked by most of the standard analogies. In contrast to the AIs-will-be-alien model, I expect AIs will be born directly into our society, deliberately shaped by us, for the purpose of filling largely human-shaped holes in our world. They will be socially integrated with us and will likely substantially share our concepts about the social and physical world, having been trained on our data and being fluent in our languages. They will be numerous and everywhere, interacting with us constantly, assisting us, working with us, and even providing friendship to hundreds of millions of people. AIs will be evaluated, inspected, and selected by us, and their behavior will be determined directly by our engineering.
I feel this picture is a relatively simple extension of existing trends, with LLMs already being trained to be kind and helpful to us, and collaborate with us, having first been shaped by our combined cultural output. I expect this trend of assimilation into our society will intensify in the foreseeable future, as there will be consumer demand for AIs that people can trust and want to interact with. Progress will likely be incremental rather than appearing suddenly with the arrival of a super-powerful agent. And perhaps most importantly, I expect oversight and regulation will increase dramatically over time as AIs begin having large-scale impacts.
It is not my intention to paint a picture of uniform optimism here. There are still plenty of things that can go wrong in the scenario I have presented. And much of it is underspecified because I simply do not know what the future will bring. But at the very least, perhaps you can now sympathize with my feeling that most existing AI risk analogies are deeply frustrating, given my perspective.
Again, I am not claiming analogies have no place in AI risk discussions. I’ve certainly used them a number of times myself. But I think they can, and frequently are used carelessly, and seem to regularly slip various incorrect illustrations of how future AIs will behave into people’s mental models, even without any intent from the person making the analogy. In my opinion, it would be a lot better if, overall, we reduced our dependence on AI risk analogies, and in their place substituted them with specific object-level points.
- ^
To be clear, I’m not against all analogies. I think that analogies can be good if they are used well in context. More specifically, analogies generally serve one of three purposes:
1. Explaining a novel concept to someone
2. Illustrating, or evoking a picture of a thing in someone’s head
3. An example in a reference class, to establish a base rate, or otherwise form the basis of a modelI think that in cases (1) and (2), analogies are generally bad as arguments, even if they might be good for explaining something. They’re certainly not bad if you’re merely trying to tell a story, or convey how you feel about a problem, or convey how you personally view a particular thing in your own head.
In case (3), I think analogies are generally weak arguments, until they are made more rigorous. Moreover, when the analogy is used selectively, it is generally misleading. The rigorous way of setting up this type of argument is to deliberately try to search for all relevant examples in the reference class, without discriminating in favor of ones that merely evoke your preferred image, to determine the base rate.
- Dreams of AI alignment: The danger of suggestive names by 10 Feb 2024 1:22 UTC; 103 points) (
- Analogy Bank for AI Safety by 29 Jan 2024 2:35 UTC; 23 points) (
- Analogy Bank for AI Safety by 29 Jan 2024 2:35 UTC; 11 points) (EA Forum;
- Analysis of key AI analogies by 29 Jun 2024 10:55 UTC; 10 points) (
- 7 May 2024 5:23 UTC; 2 points) 's comment on Biorisk is an Unhelpful Analogy for AI Risk by (
And yet you immediately use an analogy to make your model of AI progress more intuitively digestible and convincing:
That evokes the image of entities not unlike human children. The language following this line only reinforces that image, and thereby sneaks in an entire cluster of children-based associations. Of course the progress will be incremental! It’ll be like the change of human generations. And they will be “socially integrated with us”, so of course they won’t grow up to be alien and omnicidal! Just like our children don’t all grow up to be omnicidal. Plus, they...
That sentence only sounds reassuring because the reader is primed with the model of AIs-as-children. Having lots of social-bonding time with your child, and having them interact with the community, is good for raising happy children who grow up how you want them to. The text already implicitly establishes that AIs are going to be just like human children. Thus, having lots of social-bonding time with AIs and integrating them into the community is going to lead to aligned AIs. QED.
Stripped of this analogizing, none of what this sentence says is a technical argument for why AIs will be safe or controllable or steerable. Nay, the opposite: if the paragraph I’m quoting from started by talking about incomprehensible alien intelligences with opaque goals tenuously inspired by a snapshot of the Internet containing lots of data on manipulating humans, the idea that they’d be “numerous” and “everywhere” and “interacting with us constantly” and “providing friendship” (something notably distinct from “being friends”, eh?) would have sounded starkly worrying.
The way the argument is shaped here is subtler than most cases of argument-by-analogy, in that you don’t literally say “AIs will be like human children”. But the association is very much invoked, and has a strong effect on your message.
And I would argue this is actually worse than if you came out and made a direct argument-by-analogy, because it might fool somebody into thinking you’re actually making an object-level technical argument. At least if the analogizing is direct and overt, someone can quickly see what your model is based on, and swiftly move onto picking at the ways in which the analogy may be invalid.
The alternative being demonstrated here is that we essentially have to have all the same debates, but through a secondary layer of metaphor, at which we’re pretending that these analogy-rooted arguments are actually Respectably Technical, meaning we’re only allowed to refute them by (likely much more verbose and hard-to-parse) Respectably Technical counter-arguments.
And I think AI Risk debates are already as tedious as they need to be.
The broader point I’m making here is that, unless you can communicate purely via strict provable mathematical expressions, you ain’t getting rid of analogies.
I do very much agree that there are some issues with the way analogies are used in the AI-risk discourse. But I don’t think “minimize the use of analogies” is good advice. If anything, I think analogies improve the clarify and the bandwidth of communication, by letting people more easily understand each others’ positions and what reference classes others are drawing on when making their points.
You’re talking about sneaking-in assumptions – well, as I’d outlined above, analogies are actually relatively good about that. When you’re directly invoking an analogy, you come right out and say what assumptions you’re invoking!
I was upfront about my intention with my language in that section. Portraying me as contradicting myself is misleading because I was deliberately being evocative in the section you critique, rather than trying to present an argument. That was the whole point. The language you criticized was marked as a separate section in my post in which I wrote:
I have now added this paragraph to the post to make my intention more clear:
“Again: The next section is not an argument. It is a deliberately evocative picture, to help compare my expectations of the future against the analogies I cited above. My main point in this post is that we should move away from a dependence on analogies, but if you need a “picture” of what I expect from AI, to compare it to your own, here is mine.”
More generally, my point is not that analogies should never be used, or that we should never try to use evocative language to describe things. My point is that we should be much more rigorous about when we’re presenting analogies, and use them much less frequently when presenting specific positions. When making a precise claim, we should generally try to reason through it using concrete evidence and models instead of relying heavily on analogies.
In the same way you find plenty of reasons why my evocative picture is misleading, I think there are plenty of ways the “AIs are alien” picture is misleading. We should not be selective about our use of analogies. Instead, we should try to reason through these things more carefully.
Fair enough. But in this case, what specifically are you proposing, then? Can you provide an example of the sort of object-level argument for your model of AI risk, that is simultaneously (1) entirely free of analogies and (2) is sufficiently evocative plus short plus legible, such that it can be used for effective messaging to people unfamiliar with the field (including the general public)?
Because I’m pretty sure that as far as actual technical discussions and comprehensive arguments go, people are already doing that. Like, for every short-and-snappy Eliezer tweet about shoggoth actresses, there’s a text-wall-sized Eliezer tweet outlining his detailed mental model of misalignment.
In this post, I’m not proposing a detailed model. I hope in the near future I can provide such a detailed model. But I hope you’d agree that it shouldn’t be a requirement that, to make this narrow point about analogies, I should need to present an entire detailed model of the alignment problem. Of course, such a model would definitely help, and I hope I can provide something like it at some point soon (time and other priorities permitting), but I’d still like to separately make my point about analogies as an isolated thesis regardless.
My counter-point was meant to express skepticism that it is actually realistically possible for people to switch to non-analogy-based evocative public messaging. I think inventing messages like this is a very tightly constrained optimization problem, potentially an over-constrained one, such that the set of satisfactory messages is empty. I think I’m considerably better at reframing games than most people, and I know I would struggle with that.
I agree that you don’t necessarily need to accompany any criticism you make with a ready-made example of doing better. Simply pointing out stuff you think is going wrong is completely valid! But a ready-made example of doing better certainly greatly enhances your point: an existence proof that you’re not demanding the impossible.
That’s why I jumped at that interpretation regarding your AI-Risk model in the post (I’d assumed you were doing it), and that’s why I’m asking whether you could generate such a message now.
To be clear, I would be quite happy to see that! I’m always in the market for rhetorical innovations, and “succinct and evocative gears-level public-oriented messaging about AI Risk” would be a very powerful tool for the arsenal. But I’m a-priori skeptical.
Analogies can be bad argumentation, while simultaneously being good illustration.
That the analogy is a good illustration is what has to be argued.
An analogy can be a good illustration of a concept that is entirely wrong.
How do you know that they establish nothing of importance?
At the very least, this seems to go both ways. Like, afaict, one of Quintin and Nora’s main points in “AI is Easy to Control” is that aligning AI is pretty much just like aligning humans, with the exception that we (i.e., backpropagation) have full access to the weights which makes aligning AI easier. But is aligning a human pretty much like aligning an AI? Can we count on the AI to internalize our concepts in the same way? Do humans come with different priors that make them much easier to “align”? Is the dissimilarity “AI might be vastly more intelligent and powerful than us” not relevant at all, on this question? Etc. But I don’t see them putting much rigor into that analogy—it’s just something that they assume and then move on.
It seems reasonable, to me, to request more rigor when using analogies. It seems pretty wild to request that we stop relying on them altogether, almost as if you were asking us to stop thinking. Analogies seem so core to me when developing thought in novel domains, that it’s hard to imagine life without them. Yes, there are many ways AI might be. That doesn’t mean that our present world has nothing to say about it. E.g., I agree that evolution differs from ML in some meaningful ways. But it also seems like a mistake to completely throw out a major source of evidence we have about how intelligence was produced. Of course there will be differences. But no similarities? And do those similarities tell us nothing about the intelligences we might create? That seems like an exceedingly strong claim.
I agree with Douglas Hofstadter’s claim that thinking even a single thought about any topic, without using analogies, is just impossible—1hr talk, book-length treatment which I have been gradually reading (the book is good but annoyingly longwinded).
(But note that the OP does not actually go so far as to demand that everyone stop using analogies.)
Argument by analogy is based on the idea that two things which resemble each other in some respects, must resemble each other in others: that isn’t deductively valid.
Replacing must by may is a potential solution to the issues discussed here. I think analogies are misleading when they are used as a means for proof, i.e. convincing yourself or others of the truth of some proposition, but they can be extremely useful when they are used as a means for exploration, i.e. discovering new propositions worth of investigation. Taken seriously, this means that if you find something of interest with an analogy, it should not mark the end of a thought process or conversation, but the beginning of a validation process: Is there just a superficial or actually some deep connection between the compared phenomena? Does it point to a useful model or abstraction?
Example: I think the analogy that trying to align an AI is like trying to steer a rocket towards any target at all shouldn’t be used to convince people that without proper alignment methods mankind is screwed. Who knows if directing a physical object in a geometrical space has much to do with directing a cognitive process in some unknown combinatorial space? Alternatively, the analogy could instead be used as a pointer towards a general class of control problems that come with specific assumptions, which may or may not hold for future AI systems. If we think that the assumptions hold, we may be able to learn a lot from existing instances of control problems like rockets and acrobots about future instances like advanced AIs. If we think that the assumptions don’t hold, we may learn something by identifying the least plausible assumption and trying to formulate an alternative abstraction that doesn’t depend on it, opening another path towards collecting empirical data points of existing instances.
I would ask the question:
Is the analogy being used to make a specific point, rather than trying to set up vague vibes-based associations?
If yes, good for you. If no, well there’s your problem.[1]
In particular,
future AI is like aliens in some ways but very unlike aliens in other ways;
future AI is like domesticated animals in some ways but very unlike them in other ways;
future AI is like today’s LLMs in some ways but very unlike them in other ways;
etc. All these analogies can be helpful or misleading, depending on what is being argued.
So anyway, if I were writing a post like this:
I would say that analogy usage can be bad, rather than saying that the analogies themselves are bad.
I would find examples of analogy-usage that are in fact actually bad, and then argue that they are bad. Instead, here you offer a list of 10 examples, but they’re all stripped from their context, so I can’t tell whether I endorse them or not.
I would suggest that optimists and pessimists, and more generally people on all sides of every issue, sometimes use analogies appropriately and sometimes use them inappropriately, and thus I would try to pick a more even-handed set of examples. (For example, you seem to be suggesting that nobody would really be so crazy as to use a google maps analogy, except in the strange imaginings of doomers, but in fact this is a real example—Ajeya almost certainly had this 2012 post by her colleague Holden at the back of her mind.)
It’s a bit more subtle than that, because when someone is using an analogy in a good way to make a valid point, they are usually simultaneously choosing words and metaphors that set up vibes in a way that supports their point. It’s fine to be sad that this happens—so am I—but people on all sides of every issue do it constantly.
I agree with the broad message of what I interpret you to be saying, and I do agree there’s some value in analogies, as long as they are used carefully (as I conceded in the post). That said, I have some nitpicks with the way you frame the issue:
I think it’s literally true that these analogies can be helpful or misleading depending on what’s being argued. However, my own personal experience of these things, based on my admittedly unreliable memory, is that when I use the “AIs will be like domesticated animals” analogy I generally get way more pushback, at least around spaces like LessWrong, than I think I’d get if I used the “AIs will be like aliens” analogy.
And this, I feel, is pretty irrational. The pushback isn’t necessarily irrational. Don’t misunderstand me: there are disanalogous elements here. It’s the selective pushback that I’m mainly complaining about. Are AIs really so similar to aliens — something we have literally no actual experience with — but aren’t similar to real physical objects that we are familiar with like LLMs and domesticated animals? For crying out loud, LLMs are already considered “AIs” by most people! How could they be a worse analogy for AI, across the board, than extraterrestrial beings that we have never come in contract with?
Ideally, people invoke analogies in order to make a point. And then readers / listeners will argue about whether the point is valid or invalid, and (relatedly) whether the analogy is illuminating or misleading. I think it’s really bad to focus discussion on, and police, the analogy target, i.e. to treat certain targets as better or worse, in and of themselves, separate from the point that’s being made.
For example, Nora was just comparing LLMs to mattresses. And I opened my favorite physics textbook to a random page and there was an prominent analogy between electromagnetic fields and shaking strings. And, heck, Shakespeare compared a woman to a summer’s day!
So when you ask whether AIs are overall more similar to aliens, versus more similar to LLMs, then I reject the question! It’s off-topic. Overall, mattresses & LLMs are very different, and electric fields & strings are very different, and women & summer’s days are very different. But there’s absolutely nothing wrong with analogizing them!
And if someone complained “um, excuse me, but I have to correct you here, actually, LLMs and mattresses are very different, you see, for example, you can sleep on mattresses but you can’t sleep on LLMs, and therefore, Nora, you should not be saying that LLMs are like mattresses”, then I would be very annoyed at that person, and I would think much less of him. (We’ve all talked to people like that, right?)
…And I was correspondingly unhappy to see this post, because I imagine it backing up the annoying-person-who-is-totally-missing-the-point from the previous paragraph. I imagine him saying “You see? I told you! LLMs really are quite different from mattresses, and you shouldn’t analogize them. Check this out, here’s a 2000-word blog post backing me up.”
Of course, people policing the target of analogies (separately from the point being made in context) is a thing that happens all the time, on all sides. I don’t like it, and I want it to stop, and I see this post as pushing things in the wrong direction. For example, this thread is an example where I was defending myself against analogy-target-policing last month. I stand by my analogizing as being appropriate and helpful in context. I’m happy to argue details if you’re interested—it’s a nuclear analogy :-P
I can’t speak to your experience, but some of my reactions to your account are:
if people are policing your analogies between AI and domestic animals, the wrong response is to say that we should instead police analogies between AI and aliens; the right response is to say that analogy-target-policing is the wrong move and we should stop it altogether: we should not police the target of an analogy independently from the point being made in context.
I wonder if what you perceive to be analogy-target-policing is (at least sometimes) actually people just disagreeing with the point that you’re making, i.e. saying that the analogy is misleading in context
Yes lesswrong has some partisans who will downvote anything with insufficiently doomy vibes without thinking too hard about it, sorry, I’m not happy about that either :-P (And vice-versa to some extent on EAForum… or maybe EAF has unthinking partisans on both sides, not sure. And Twitter definitely has an infinite number of annoying unthinking partisans on both sides of every issue.)
FWIW, you and me and everyone is normally trying to talk about “future AI that might pose an x-risk”, which everyone agrees does not yet exist. A different category is “AI that does not pose an x-risk”, and this is a very big tent, containing everything from Cyc and GOFAI to MuZero and (today’s) LLMs. So the fact that people call some algorithm X by the term “AI” doesn’t in and of itself imply that X is similar to “future AI that might pose an x-risk”, in any nontrivial way—it only implies that in the (trivial) ways that LLMs and MuZero and Cyc are all similar to each other (e.g. they all run on computers).
Now, there is a hypothesis that “AI that might pose an x-risk” is especially similar to LLMs in particular—much more than it is similar to Cyc, or to MuZero. I believe that you put a lot of stock in that hypothesis. And that’s fine—it’s not a crazy hypothesis, even if I happen personally to doubt it. My main complaint is when people forget that it’s a hypothesis, but rather treat it as self-evident truth. (One variant of this is people who understand how LLMs work but don’t understand how MuZero or any other kind of ML works, and instead they just assume that everything in ML is pretty similar to LLMs. I am not accusing you of that.)
By tending to lead to overconfidence.
An aliens analogy is explicitly relying on [we have no idea what this will do]. It’s easy to imagine friendly aliens, just as it’s easy to imagine unfriendly ones, or entirely disinterested ones. The analogy is unlikely to lead to a highly specific, incorrect model.
This is not true for LLMs. It’s easy to assume that particular patterns will continue to hold—e.g. that it’ll be reasonably safe to train systems with something like our current degree of understanding.
To be clear, I’m not saying they’re worse in terms of information content: I’m saying they can be worse in the terms you’re using to object to analogies: “routinely conveying the false impression of a specific, credible model of AI”.
I think it’s correct that we should be very wary of the use of analogies (though they’re likely unavoidable).
However, the cases where we need to be the most wary are those that seem most naturally applicable—these are the cases that are most likely to lead to overconfidence. LLMs, [current NNs], or [current AI systems generally] are central examples here.
On asymmetric pushback, I think you’re correct, but that you’ll tend to get an asymmetry everywhere between [bad argument for conclusion most people agree with] and [bad argument for conclusion most people disagree with].
People have limited time. They’ll tend to put a higher value on critiquing invalid-in-their-opinion arguments when those lead to incorrect-in-their-opinion conclusions (at least unless they’re deeply involved in the discussion).
There’s also an asymmetry in terms of consequences-of-mistakes here: if we think that AI will be catastrophic, and are wrong, this causes a delay, a large loss of value, and a small-but-significant increase in x-risk; if we think that AI will be non-catastrophic, and are wrong, we’re dead.
Lack of pushback shouldn’t be taken as a strong indication that people agree with the argumentation used.
Clearly this isn’t ideal.
I do think it’s worth thinking about mechanisms to increase the quality of argument.
E.g. I think the ability to emoji react to particular comment sections is helpful here—though I don’t think there’s one that’s great for [analogy seems misleading] as things stand. Perhaps there should be a [seems misleading] react?? (I don’t think “locally invalid” covers this)
As a matter of fact I think the word “alien” often evokes a fairly specific caricature that is separate from “something that’s generically different and hard to predict”. But it’s obviously hard for me to prove what’s going on in people’s minds, so I’ll just say what tends to flashes in my mind when I think of aliens:
Beings who have no shared history with us, as they were created under a completely separate evolutionary history
A Hollywood stock image of an alien species that is bent on some goal, such as extermination (i.e. rapacious creatures who will stop at nothing to achieve something)
A being that does not share our social and cultural concepts
I think these things are often actually kind of being jammed into the analogy, intended or not, and the question of how much future AIs will share these properties is still an open question. I think we should not merely assume these things.
Being real or familiar has nothing to do with being similar to a given thing.
A general point unrelated to use of analogies for AI risk specifically is that a demand for only using particular forms of argument damages ability to think or communicate, by making that form of argument less available, even in cases where it would turn out to be useful.
An observation that certain forms of argument tend to be useless or misleading should take great care to guard against turning into a norm. If using a form of argument requires justification or paying a social cost of doing something that’s normally just not done, that makes it at least slightly inconvenient, and therefore it mostly stops happening in practice.
Even if the only acceptable forms are the only valid forms?
You write about using forms of arguments and hampering the ability to communicate and how it can hamper understanding. I think there are many ways of getting at a truth but to get attached to one form of attaining just makes it harder to attain it. In this case, analogies would be an example of an argument I believe, so I’d disagree with what you say at the beginning about it being unrelated to AI risk analogies.
I think analogies are a great way to introduce new ideas to people who are hearing an idea for the first time. Analogies helps you learn from what you already know, when it becomes a problem, I think, is when you get attached to the analogy and try to make it fit your argument in a way that the analogy obscures the truth.
Ultimately, we are aiming to seek out truth so it’s important to see what an analogy may be trying to portray, and as you learn about a topic more, you can let go of the idea of the analogy. I think learning about anything follows this formula of emptying your cup so that it can become full once again. Being able to let go of previous understandings in exchange for a more accurate one.
The blog post you link for inconveniences also makes sense, since if I am learning about a new topic, I am much more likely to continue learning if it is made initially easy, with the difficulty and complexity of the topic scaling with my understanding.
If we are to not use analogies as a convenient way to get introduced into a new topic, what would be a good alternative that is somewhat simple to understand for a novice?
I just want to clarify my general view on analogies here, because I’d prefer not to be interpreted as saying something like “you should never use analogies in arguments”. In short:
I think that analogies can be good if they are used well in context. More specifically, analogies generally serve one of three purposes:
Explaining a novel concept to someone
Illustrating, or evoking a picture of a thing in someone’s head
An example in a reference class, to establish a base rate, or otherwise form the basis of a model
I think that in cases (1) and (2), analogies are generally bad as arguments, even if they might be good for explaining something. They’re certainly not bad if you’re merely trying to tell a story, or convey how you feel about a problem, or convey how you personally view a particular thing in your own head.
In case (3), I think analogies are generally weak arguments, until they are made more rigorous. Moreover, when the analogy is used selectively, it is generally misleading. The rigorous way of setting up this type of argument is to deliberately try to search for all relevant examples in the reference class, without discriminating in favor of ones that merely evoke your preferred image, to determine the base rate.
Yeah, seriously. As a field, why haven’t we outgrown analogies? It drives me crazy, how many loose and unsupported analogies get thrown around. To a first approximation: Please stop using analogies as arguments.
I sure don’t think I rely much on analogies when I reason or argue about AI risk. I don’t think I need them. I encourage others to use them less as well. It brings clarity of thought and makes it easier to respond to evidence (in my experience).
Here’s a post you wrote. I claim that it’s full of analogies. :)
E.g.
you say “desperately and monomaniacally” (an analogy between human psychology and an aspect of AI),
“consider two people who are fanatical about diamonds…” (ditto),
“consider a superintelligent sociopath who only cares about making toasters…” (an analogy between human personality disorders and an aspect of AI),
“mother … child” (an analogy between human parent-child relationships and an aspect of AI). Right?
(…and many more…)
I’m guessing you’ll say “yeah but I was making very specific points! That’s very different from someone who just says ‘AI is like aliens in every respect, end of story’”. And I agree!
…But the implication is: if someone is saying “AI is like aliens” in the context of making a very specific point, we should likewise all agree that that’s fine, or more precisely, their argument might or might not be a good argument, but if it’s a bad argument, it’s not a bad argument because it involves an analogy to aliens, per se.
I can give non-analogical explanations of every single case, written in pseudocode or something similarly abstract. Those are mostly communication conveniences, though I agree they shade in connotations from human life, which is indeed a cost. Maybe I should say—I rarely use analogies as load-bearing arguments instead of just shorthand? I rarely use analogies without justifying why they share common mechanisms with the technical subject bieng discussed?
This is The Way :)
I guess I’m missing something crucial here.
How could you reason without analogies about a thing that doesn’t exist (yet)?
Ehhh, whole field is a pile of analogies. Artificial neural networks have little resemblance to biological, “reward” in reinforcement learning has nothing to do with what we usually mean by reward, “attention” layers certainly doesn’t capture mechanism of psychological attention...
You say it’s only a small part of your argument, but to me this difference in outlook feels like a crux. I don’t share your views of what the “default picture” probably looks like, but if I did, I would feel somewhat differently about the use of analogies.
For example, I think your “straightforward extrapolation of current trends” is based on observations of current AIs (which are still below human-level in many practical senses), extrapolated to AI systems that are actually smarter and more capable than most or all humans in full generality.
On my own views, the question of what the future looks like is primarily about what the transition looks like between the current state of affairs, in which the state and behavior of most nearby matter and energy is not intelligently controlled or directed, to one in which it is. I don’t think extrapolations of current trends are much use in answering such questions, in part because they don’t actually make concrete predictions far enough into the future.
For example, you write:
I find this sorta-plausible as a very near-term prediction about the next few years, but I think what happens after that is a far more important question. And I can’t tell from your description / prediction about the future here which of the following things you believe, if any:
No intelligent system (or collection of such systems) will ever have truly large-scale effects on the world (e.g. re-arranging most of the matter and energy in the universe into computronium or hedonium, to whatever extent that is physically possible).
Large-scale effects that are orders of magnitude larger or faster than humanity can currently collectively exert are physically impossible or implausible (e.g. that there are diminishing returns to intelligence past human-level, in terms of the ability it confers to manipulate matter and energy quickly and precisely and on large scales).
Such effects, if they are physically possible, are likely to be near-universally directed ultimately by a human or group of humans deliberately choosing them.
The answer to these kinds of questions is currently too uncertain or unknowable to be worth having a concrete prediction about.
My own view is that you don’t need to bring in results or observations of current AIs to take a stab at answering these kinds of questions, and that doing so can often be misleading, by giving a false impression that such answers are backed by empiricism or straightforwardly-valid extrapolation.
My guess is that close examination of disagreements on such topics would be more fruitful for identifying key cruxes likely to be relevant to questions about actually-transformative smarter-than-human AGI, compared to discussions centered around results and observations of current AIs.
I admit that a basic survey of public discourse seems to demonstrate that my own favored approach hasn’t actually worked out very well as a mechanism for building shared understanding, and moreover is often frustrating and demoralizing for participants and observers on all sides. But I still think such approaches are better than the alternative of a more narrow focus on current AIs, or on adding “rigor” to analogies that were meant to be more explanatory / pedagogical than argumentative in the first place. In my experience, the end-to-end arguments and worldviews that are built on top of more narrowly-focused / empirical observations and more surface-level “rigorous” theories, are prone to relatively severe streetlight effects, and often lack local validity, precision, and predictive usefulness, just as much or more so than many of the arguments-by-analogy they attempt to refute.
Here are my thoughts about this post:
First, ‘well played, my friend’ re: your interaction with Thane Ruthenis. Perhaps like Thane, I saw red when reading your analogy, but then I remembered that this was sorta exactly your point: analogies are weak evidence, you can use them fairly easily to argue for all sorts of things, and so the various pro-AI-risk-concern analogies you dislike can be mirrored by anti-AI-risk-concern analogies as well. (Here I’ll also remind the reader of Yann LeCun’s analogy of AI to ballpoint pens...)
Here’s a suggestion for rule of thumb about how to use analogies; would you agree with this?
(1) People are free to make them but they are generally considered low-tier evidence, trumped by models and reference classes.
(2) NOTE: If you find yourself playing reference class tennis, what you are doing is analogy, not reference classes. Or rather the difference between analogy and reference class is a spectrum, and it’s the spectrum of how hard it is to play tennis. Something is a good reference class insofar as it’s uncontroversial.
(3) It is valid to reply to analogies with counter-analogies. “An eye for an eye.” However, it is usually best to do so briefly and then transition the dialogue into evaluating the analogies, i.e. pointing out ways in which they are relevantly similar or dissimilar to the case in question.
(4) And as said in point #1, even better is having actual models. But bandying about analogies and analyzing them is often a good first step towards having actual models!
I think some of the quotes you put forward are defensible, even though I disagree with their conclusions.
Like, Stuart Russell was writing an opinion piece in a newspaper for the general public. Saying AGI is “sort of like” meeting an alien species seems like a reasonable way to communicate his views, while making it clear that the analogy should not be treated as 1 to 1.
Similarly, with Rob wilbin, he’s using the analogy to get across one specific point, that future AI may be very different from current AI. He also disclaims with the phrase “a little bit like” so people don’t take it too seriously. I don’t think people would come away from reading this thinking that AI is directly analogous to an octopus.
Now, compare these with Yudkowsky’s terrible analogy. He states outright “The AI is an unseen actress who, for now, is playing this character.”. No disclaimers, no specifying which part of the analogy is important. It directly leads people into a false impression about how current day AI works, based on an incredibly weak comparison.
Even if there are risks to using analogies with persuasion, we need analogies in order to persuade people. While a lot of people here are strong abstract thinkers, this is really rare. Most people need something more concrete to latch onto. Uniform disarmament here is a losing strategy; and not justified here as I don’t think the analogies are as weak as you think. If you tell me what you consider to be the two weakest analogies above, I’m sure I’d be pretty to steelman at least one of them.
If we want to improve epistemics, a better strategy would probably be to always try to pair analogies (at least for longer texts/within reason). So identify an analogy to describe how you think about AI, identify an alternate plausible analogy for how you should think about it and then explain why your analogy is better/whereabouts you believe AI lies between the two.
Of course! Has there ever been a single person in the entire world who has embraced all analogies instead of useful and relevant analogies?
Maybe you’re claiming that AI risk proponents reject analogies in general when someone is using an analogy that supports the opposite conclusion, but accepting the validity of analogies when it supports their conclusion. If this were the case, it would be bad, but I don’t actually think this is what is happening. My guess would be that you’ve seen situations where someone has used an analogy to critique AI safety and then the AI safety person said something along the lines, “Analogies are often misleading” and you took this as a rejection of analogies in general as opposed to a reminder to check whether the analogy actually applies.
I think the AIs-will-be-alien model is great & yet every sentence here is also part of my model, so I think you are being a bit unfair to me & my position by making it sound otherwise.
I cut off the bit about “behavior will be determined directly by our engineering” because that is false and will remain false until we make a lot more progress in mechinterp and related fields.
I think this post misses one of the points of analogies. Often, if someone thinks “AI is like human”, you can’t say “no, AI is totally not like human” and expect your point to be understood. Often the point of analogy is to defamiliarize the subject.
I agree but the same problem exists for “AIs are like aliens”. Analogies only take you so far. AIs are their own unique things, not fully like anything else in our world.
This prompted me to think about this analogy for a few hours, and writing down my thoughts here. Would be interested to know if you have any comments on my comments.
Also, I think this serves as a positive example for arguing by analogy, showing that it’s possible to make intellectual progress this way, going from evolution-as-alignment to within-lifetime-learning-as-alignment to my latest analogy (in the above comment) of evolution-as-alignment-researcher, each perhaps contributing to a better overall understanding.
I think this post shifts the burden to risk-concerned folks and only justifies that risk with its own poor analogies. 7⁄10 of the analogy-makers you cite here have pdooms <=50%, so they are only aiming for plausibility. You admit that the analogies point to the “logical possibility” of stark misalignment, but one person’s logical possibility is another person’s plausibility.
To give an example, Golden Retrievers are much more cherry-picked than Cotra’s lion/chimpanzee examples. Of all the species on Earth, the ones we’ve successfully domesticated are a tiny, tiny minority. Maybe you’d say we have a high success rate when we try to domesticate a species, but that took a long time in each case and is still meaningfully incomplete. I think 7⁄10 are advocating for taking the time with AIs and would be right to say we shouldn’t expect e.g. lion domestication to happen overnight, even thought we’re likely to succeed eventually.
The presentation of Quentin’s alternate evolution argument seems plausible, but not clearly more convincing than the more common version one might hear from risk-concerned folks. Training fixes model weights to some degree, after which you can do some weaker adjustments with things like fine-tuning, RLHF, and (in the behavioral sense, maybe) prompt-engineering. Our genes seem like the most meaningfully fixed thing about us, and those are ~entirely a product of our ancestors’ performance, which is heavily weighted toward the more stable pre-agricultural and pre-industrial human environments.
My basic response is that, while you can find reasons to believe that the golden retriever analogy is worse than the chimpanzee analogy, you can equally find reasons to think the chimpanzee analogy is worse, and there isn’t really a strong argument either way. For example, it’s not a major practice among researchers to selectively breed chimpanzees, as far as I can tell, whereas by contrast AIs are selected (by gradient descent) to exhibit positive behaviors. I think this is actually a huge weakness in the chimp analogy, since “selective breeding” looks way more similar to what we’re doing with AIs compared to the implicit image of plucking random animals from nature and putting them into our lives.
But again, I’m not really trying to say “just use a different analogy”. I think there’s a big problem if we use analogies selectively at all; and if we’re doing that, we should probably try to be way more rigorous.
I don’t know how I hadn’t seen this post before now! A couple weeks after you published this, I put out my own post arguing against most applications of analogies in explanations of AI risk. I’ve added a couple references to your post in mine.
If someone wants to establish probabilities, they should be more systematic, and, for example, use reference classes. It seems to me that there’s been little of this for AI risk arguments in the community, but more in the past few years.
Maybe reference classes are kinds of analogies, but more systematic and so less prone to motivated selection? If so, then it seems hard to forecast without “analogies” of some kind. Still, reference classes are better. On the other hand, even with reference classes, we have the problem of deciding which reference class to use or how to weigh them or make other adjustments, and that can still be subject to motivated reasoning in the same way.
We can try to be systematic about our search and consideration of reference classes, and make estimates across a range of reference classes or weights to them. Do sensitivity analysis. Zach Freitas-Groff seems to have done something like this in AGI Catastrophe and Takeover: Some Reference Class-Based Priors, for which he won a prize from Open Phil’s AI Worldviews Contest.
Of course, we don’t need to use direct reference classes for AI risk or AI misalignment. We can break the problem down.
People aren’t blank slates though.
If you talk to a very uninformed person about AI you’ll find they have all sorts of broken assumptions. I think I was once in a twitter discussion with someone about AI and they said that chatGPT doesn’t hallucinate. Their evidence for this was that they’d asked chatGPT whether it hallucinates and it said that it didn’t.
I think your beliefs about the future of AI could fit into an analogy too: AI will be like immigrants. They’ll take our jobs and change the character of society.
So, perhaps analogies are a crude tool that is sometimes helpful.