I wrote a post to my Substack attempting to compile all of the best arguments against AI as an existential threat.
Some arguments that I discuss include: international game theory dynamics, reference class problems, knightian uncertainty, superforecaster and domain expert disagreement, the issue with long-winded arguments, and more!
Please tell me why I’m wrong, and if you like the article, subscribe and share it with friends!
“Long-winded arguments tend to fail” is a daring section title in your 5,000 word essay :P
In general, I think the genre “collect all the arguments I can find on only one side of a controversial topic” is bound to lead to lower-quality inclusions, and that section is probably among them. I prefer the genre “collect the best models I can find of a controversial topic and try to weigh them.”
Why is “The argument for AI risk has a lot of necessary pieces, therefore it’s unlikely” a bad argument?
Any real-world prediction can be split into an arbitrary number of conjunctive pieces.
“It will rain tomorrow” is the prediction “it will rain over my house, and the house next to that, and the house next to that, and the house across the street, and the house next to it, and the house next to that, etc.” Surely any conjunction with so many pieces is doomed to failure, right? But wait, the prediction “it won’t rain tomorrow” has the same problem.
Could you split the claim that future AI will do good things and not bad things into lots of conjunctions?
It’s easy to get sloppy about whether pieces are really necessary or not.
E.g. “The current neural net paradigm continues” isn’t strictly necessary for future AI doing bad things rather than good things. But since we’re just sort of making a long list, there’s the temptation to just slap it onto the list and not worry about the leakage of probability-mass when it turns out not to be necessary. But if each step leaks a little, and you’re making a list of a large number of steps… well, point is it’s sometimes easy to fool yourself with this argument structure.
Figuring out conditional probabilities is hard.
At the start of the list, the probabilities are, while not easy, at least straightforward to understand. E.g. your item 1 is “P(we have enough training data, somehow)”. This is hard to predict, but at least it’s clear what you mean. But every step of the list has to be conditional on all the previous steps.
(This was the flaw with the argument that “it will rain tomorrow” has low probability. Once it’s raining on the first 99 houses, adding the 100th house to the list doesn’t move the probabilities much.)
So by the time you’re at item 16, you’re not just asking P(item 16), you’re asking P(item 16, given that 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15 are true), which is often an unintuitive slice of reality it’s hard to estimate probabilities about.
I laughed at your first line, so thank you for that lol. I would love to hear more about why you prefer to collect models over arguments because i don’t think I intuitively get the reasons for why this would be better—to be fair, I haven’t spent enough time thinking about it probably. Any references you like on arguments for this would be super helpful!
I agree that many (even simple) arguments can be split up into many pieces—this is a good point. I would however say that there are still more and less complicated (ie more premises with lower probabilities) arguments that should receive lower credences. I assume that you will agree with this, and maybe you can let me know precisely the way to get the AGI doom argument off the ground with the least amount of contingent propositions—that would be really helpful.
Totally agree—it’s pretty hard to determine what will be necessary and this could lead to argument sloppiness. Though, I don’t think we should throw our hands in the air, say the argument is sloppy, and ignore it (I am not saying that you are or plan to do this for the record) -- I only mean to say that it should count for something, and I leave it up to the reader to figure out what.
One separate thing i would say, though, is that the asterisk by that indicated (this was said at the beginning of the section) that it was not necessary for the proposition AI being an existential threat—it only helps the argument. This is true for many things on that list.
Yea—you’re totally right. They’re not independent propositions making it pretty complicated (I did briefly not the fact that they had to be independent and thought it was clear enough that they weren’t, but maybe not). I agree this is really difficult to estimate probabilities on the basis of this, and I recommend big error bars and less certainty!
Thanks for the helpful feedback, though!
Well, if you’re really shooting for the least, you’ve already found the structure: just frame the argument for non-doom in terms of a lot of conjunctions (you have to build AI that we understand how to give inputs to, and also you have to solve various coordination and politics problems, and also you have to be confident that it won’t have bugs, and also you have to solve various philosophical problems about moral progress, etc.), and make lots of independent arguments for why non-doom won’t pan out. This body of arguments will touch on lots of assumptions, but very few will actually be load-bearing. (Or if you actually wrote things in Aristotelian logic, it would turn out that there were few contingent propositions, but some of those would be really long disjunctions.)
By actually making arguments you sidestep flaw #1 (symmetry), and somewhat lessen #3 (difficulty of understanding conditional probabilities). But #2 remains in full force, which is a big reason why this doesn’t lead to One Objectively Best Argument Everyone Agrees On.
(Other big reasons include the fact that arguing from less information isn’t always good—when you know more contingent facts about the world you can make better predictions—and counting number of propositions isn’t always a good measure of amount of information assumed, often leaving people disagreeing about which premises are “actually simpler”)
For what it’s worth, I find that you are equivocating in a strange way between endorsing and not endorsing these arguments.
On the one hand, here in this post you called them “the best arguments” and “tell me why I’m wrong”, which sounds a lot like an endorsement. And your post title also sounds an awful lot like an endorsement.
On the other hand, in the substack text, you say at the top that you don’t have an opinion, and you state objections without stating in your own voice that you think the objections are any good. For example, “Yann LeCun argues that the need to dominate is purely a social phenomena that does not develop because of intelligence.” Well, yes, that is true, Yann LeCun does say that. But do you think it’s a good argument? If so, you should say that! (I sure don’t!—See e.g. here.)
I think you should pick one or the other rather than equivocating. If you really don’t know where you stand, then you should retitle your post etc. Or if you find some of the arguments compelling, you should say that.
Tbh, I don’t think what I think is actually so important. The project was mainly to take arguments and compile them in a way that I thought was most convincing. I think these arguments have various degrees of validity in my mind, but i don’t know how much saying those actually matter.
Also, and this is definitely not your fault for not catching this, I write tell me why I’m wrong at the end of every blog post, so it was not a statement of endorsement. My previous blog post is entitled against utilitarianism, but I would largely consider myself to be a utilitarian (as I write there).
Also, I can think the best arguments for a given position are still pretty bad.
I much appreciate the constructive criticism, however.
On the contrary, I think that it is really important for the writer of the article to say what they think and why. Without that, it’s just a shapeless list of here’s an argument someone once made, and here’s another argument that someone else once made, and here’s more. If the writing is not powered by a drive to discover what is true, it’s of no more interest to me than a chatbot’s ramblings.
+1 to one of the things Charlie said in his comment, but I’d go even further:
The proposition “The current neural architecture paradigm can scale up to Artificial General Intelligence (AGI) (especially without great breakthroughs)” is not only unnecessary for the proposition “AI is an extinction threat” to be true, it’s not even clear that it’s evidence for the proposition “AI is an extinction threat”! One could make a decent case that it’s evidence against “AI is an extinction threat”! That argument would look like: “we’re gonna make AGI sooner or later, and LLMs are less dangerous than alternative AI algorithms for the following reasons …”.
As an example, Yann LeCun thinks AGI will be a different algorithm rather than LLMs, and here’s my argument that the AGI algorithm LeCun expects is actually super-dangerous. (LeCun prefers a different term to “AGI” but he’s talking about the same thing.)
I’m trying to figure where you were coming from that you brought up “The current neural architecture paradigm can scale up to Artificial General Intelligence (AGI) (especially without great breakthroughs)” as a necessary part of the argument.
One possibility is, you’re actually interested in the question of whether transformer-architecture self-supervised (etc.) AI is an extinction threat or not. If so, that’s a weirdly specific question, right? If it’s not an extinction threat, but a different AI algorithm is, that would sure be worth mentioning, right? But fine. If you’re interested in that narrow question, then I think your post should have been titled “against transformer-architecture self-supervised (etc) AI as an extinction threat” right? Related: my post here.
Another possibility is, you think that the only two options are, either (1) the current paradigm scales to AGI, or (2) AGI is impossible or centuries away. If so, I don’t know why you would think that. For example, Yann LeCun and François Chollet are both skeptical of LLMs, but separately they both think AGI (based on a non-LLM algorithm) is pretty likely in the next 20 years (source for Chollet). I’m more or less in that camp too. See also my brief comment here.
The claim The current neural architecture paradigm can scale up to Artificial General Intelligence (AGI) (especially without great breakthroughs) was not claimed to be necessary for the proposition (as was indicated with the asterisk and statement towards the beginning of that section). This is gonna sound bad, but it is not meant to be attacking in any way: while I really appreciate the counterarguments, please read the section more carefully before countering it.
I read it, just in case there were any new, good arguments there. Just the same ones I’ve seen (after spending a bunch of time over years looking at arguments on both sides). I guess it’s a useful project to collect the arguments, but it seems like any real work needs to be done in looking at how well they stand up to counterarguments. Which they do not to well.
The arguments you list do refute some arguments for AI x-risk, but not the important ones. They are largely driven by motivated reasoning, commonly called “wishful thinking”. Or sometimes the motivation is “AI doomers drive me crazy, I’d really like a way to make them look stupid”. I’m afraid irritating people is a great way to lose arguments even when you’re right; I’ve experimented with this a fair amount.
As the commenter on your substack says, all of these have been refuted; the refutations have not been refuted. I keep my eyes out and potentially waste time looking at the arguments against, because I’m fascinated by the public debate process on this topic (and how it might well get us all killed, as a secondary interest).
These arguments fall into several categories.
a) against specific arguments for AI risk, therefore not actually against AI risk overall
(the complex arguments fail argument in your piece for instance -
The strong arguments are simple
b) misunderstanding the arguments for AI risk
(e.g., LeCun’s argument in your post simply ignores the real argument for instrumental convergence creating subgoals of power-seeking)
c) assuming or arguing that we won’t have fully general intelligence any time soon
Possible, but you’d have to be super expert and have strong reasons to be anything like sure about this. The people making this argument are lacking the expertise, the arguments, or both.
This isn’t to say that AI is certain to kill us, just that there are no existing arguments for it being unlikely. Since we don’t know, it would be the logical thing to do to think hard about it and try to take some precautions against AGI x-risk. I think the reality is way worse than that (very good chance we’ll be pushed aside by the AGI we create, and we’re way behind on thinking about how to avoid that) but at the least, the sane conclusion from just the surface level arguments is “do we have enough people working on that?” not “I found an argument that gives me an excuse to go full steam ahead without thinking about risks anymore”!
My favorite from your list is the one about bad reference classes. I agree that arguing from reference classes are bad arguments. They’re useful as intuition pumps, but we need to think about the AGI we’ll actually build, not any historical examples, because there are none that match closely enough to give more than a vague guess. Thinking about how things will go requires actual expertise in the relevant fields- AI, psychology, game theory, etc.
Thanks for the comment; I appreciate the response! One thing: I would say generally that people should avoid assuming other motives/ bad epistemics (i.e. motivated reasoming) unless pretty obvious (which I don’t think is the case here) and can be resolved by pointing it out. This usually doesn’t help anyone any parties get any closer to the truth, and if anything, it creates bad faith among people leading them to actually have other motives (which is bad, I think).
I also would be interested in what you think of my response to the argument that the commenter made.
Excellent point about the potential to create polarization by accusing one side of motivated reasoning.
This is tricky, though, because it’s also really distorting our epistemics if we think that’s what’s going on, but won’t say it publicly. My years of research on cognitive biases have made me think that motivated reasoning is ubiquitous, a large effect, and largely unrecognized.
One approach is to always mention motivated reasoning from the other side. Those of us most involved in the x-risk discussion have plenty of emotional reasons to believe in doom. These include in-group loyalty and desire to be proven right once we’ve staked our a position. And laypeople who just fear change may be motivated to see ai as a risk without any real reasoning.
But most doomers are also technophiles and AI enthusiasts. I try to monitor my biases, and I can feel a huge temptation to overrate arguments for safety and finally be able to say “let’s build it!” We tend believe that successful AGi would pretty rapidly usher in a better world, including potentially saving us and our loved ones from the pain and horror of disease and involuntary death. Yet we argue against building it.
It seems like motivated reasoning pushes harder on average in one direction on this issue.
Yet you’re right that accusing people of motivated reasoning sounds like a hostile act if one doesn’t take it as likely that everyone has motivated reasoning. And the possible polarization that could cause would be a real extra problem. Avoiding it might be worth distorting our public epistemics a bit.
An alternative is to say that these are complex issues, and reasoning about likely future events with no real reference class is very difficult, so the arguments themselves must be evaluated very carefully. When they are, arguments for pretty severe risk seem to come out on top.
To be fair, I think that Knightian uncertainty plays a large role here; I think very high p(doom) estimates are just as implausible as very low ones.
First let me say that I really appreciate your attitude toward this, and your efforts to fairly weigh the arguments on both sides, and help others do the same. That’s why I’m spending this much time responding in detail; I want to help with that project.
I looked at your response to that argument. I think we’re coming from a pretty different perspective on how to evaluate arguments. My attitude is uncommon in general, but I think more or less the default on LessWrong: I think debates are a bad way to reach the truth. They use arguments as soldiers. I think arguments about complex topics in science, including AI alignment/risk, need to be evaluated in-depth, and this is best done cooperatively, with everyone involved making a real effort to reach the truth, and therefore real effort to change their mind when appropriate (since it’s emotionally hard to do that).
It sounds to me like you’re looking at arguments more the way they work in a debate or a Twitter discussion. If we are going to make our decisions that way as a society, we’ve got a big problem. I’m afraid we will and do. But that’s no reason to make that problem worse by evaluating the truth that way when we’ve got time and goodwill. And no reason to perpetuate that mode of reasoning by publishing lists of arguments in the absence of commentary on whether they’re good arguments in your relatively expert opinion. If people publish lists of reaons for x-risk that include bad arguments, I really wish they wouldn’t, for this reaon.
To your specific points:
Fair enough, and a worthy project. But you’ve got to draw the line somewhere on an argument’s validity. Or better yet, assign each a credence. For instance, your “intuition says this is weird” is technically a valid argument, but it should add only a tiny amount of credence; this is a unique event in history, so we’d expect intuition to be extraordinarily bad at it.
Technically true, but this is not, as the internet seems to think, the best kind of true. As above, assigning a credence is the right (although time-consuming) way to deal with this. If you don’t, you’re effectively setting a threshold for inclusion, and an argument that is 99% invalidated (like LeCun’s “just don’t make them seek dominance” argument) probably shouldn’t be included.
As I said in the opener, I think that’s how arguments work in debates or on Twitter, but it’s not at all true of good discussions. With cooperative participants, it may take a good deal of discussion to identify cruxes and clarify the topic in ways that work toward agreement and better estimates of the truth. Presenting the arguments to relative novices similarly requires that depth before they’re going to understand what the argument really says, what assumptions it depends on, and therefore how valid it is to them.
Thanks! Honestly, I think this kind of project needs to get much more appreciation and should be done more by those who are very confident in their positions and would like to steelman the other side. I also often hear people very confident about their beliefs and truly have no idea what the bets counterarguments are—maybe this uncommon, but I went to an in-person rationalist meetup like last week, and the people were really confident but haven’t heard of a bunch of these counterarguments, which I though is not at all in the LessWrong spirit. That interaction was one of my inspirations for the post.
I think I agree, but I’m having a bit of trouble understanding how you would evaluate arguments so much differently than I am now. I would say my method is pretty different than that of twitter debates (in many ways, I am very sympathetic and influenced by the LessWrong approach). I think I could have made a list of cruxes of each argument, but I didn’t want the post to be too long—much fewer would read it which is why I recommended that people first get a grasp on the general arguments for AI being an existential risk right at the beginning (adding a credence or range, i think, is pretty silly given that people should be able to assign their own, and I’m just some random undergrad on the internet).
Yep—I totally agree. I don’t personally take the argument super seriously (though I attempted to steelman what that argument as I think other people take it very seriously). I was initially going to respond to every argument, but I didn’t want to make a 40+ minute post. I also did qualify that claim a bunch (as I did with others like the intractability argument)
Fair point. I do think the LeCun argument misunderstands a bunch about different aspects of the debate, but he’s probably smarter than me.
I think I’m gonna have to just disagree here. While I defintelely think finding cruxes are extremely important (and this sometimes requires much back and forth), there is a certain type of way arguments can go back and forth that I tend to think has little (and should have little) influence on beliefs—I’m open to being wrong, though!
Different but related point:
I think, generally, I largely agree with you on many things you’ve said and just appreciate the outside view more. A modest epistemology of sorts. Even if I don’t find an argument super compelling, if a bunch of people that I think are pretty smart do (Yann LeCun has done some groundbreaking work in AI stuff, so that seems like a reason to take him seriously), I’m still gonna write about it. This is another reason why I didn’t put credences on these arguments—let the people decide!