The date of AI Takeover is not the day the AI takes over
Instead, it’s the point of no return—the day we AI risk reducers lose the ability to significantly reduce AI risk. This might happen years before classic milestones like “World GWP doubles in four years” and “Superhuman AGI is deployed.”
The rest of this post explains, justifies, and expands on this obvious but underappreciated idea. (Toby Ord appreciates it; see quote below). I found myself explaining it repeatedly, so I wrote this post as a reference.
AI timelines often come up in career planning conversations. Insofar as AI timelines are short, career plans which take a long time to pay off are a bad idea, because by the time you reap the benefits of the plans it may already be too late. It may already be too late because AI takeover may already have happened.
But this isn’t quite right, at least not when “AI takeover” is interpreted in the obvious way, as meaning that an AI or group of AIs is firmly in political control of the world, ordering humans about, monopolizing violence, etc. Even if AIs don’t yet have that sort of political control, it may already be too late. Here are three examples: [UPDATE: More fleshed-out examples can be found in this new post.]
-
Superhuman agent AGI is still in its box but nobody knows how to align it and other actors are going to make their own version soon, and there isn’t enough time to convince them of the risks. They will make and deploy agent AGI, it will be unaligned, and we have no way to oppose it except with our own unaligned AGI. Even if it takes years to actually conquer the world, it’s already game over.
-
Various weak and narrow AIs are embedded in the economy and beginning to drive a slow takeoff; capabilities are improving much faster than safety/alignment techniques and due to all the money being made there’s too much political opposition to slowing down capability growth or keeping AIs out of positions of power. We wish we had done more safety/alignment research earlier, or built a political movement earlier when opposition was lower.
-
Persuasion tools have destroyed collective epistemology in the relevant places. AI isn’t very capable yet, except in the narrow domain of persuasion, but everything has become so politicized and tribal that we have no hope of getting AI projects or governments to take AI risk seriously. Their attention is dominated by the topics and ideas of powerful ideological factions that have access to more money and data (and thus better persuasion tools) than us. Alternatively, maybe we ourselves have fallen apart as a community, or become less good at seeking the truth and finding high-impact plans.
Conclusion: We should remember that when trying to predict the date of AI takeover, what we care about is the date it’s too late for us to change the direction things are going; the date we have significantly less influence over the course of the future than we used to; the point of no return.
This is basically what Toby Ord said about x-risk: “So either because we’ve gone extinct or because there’s been some kind of irrevocable collapse of civilization or something similar. Or, in the case of climate change, where the effects are very delayed that we’re past the point of no return or something like that. So the idea is that we should focus on the time of action and the time when you can do something about it rather than the time when the particular event happens.”
Of course, influence over the future might not disappear all on one day; maybe there’ll be a gradual loss of control over several years. For that matter, maybe this gradual loss of control began years ago and continues now… We should keep these possibilities in mind as well.
[Edit: I now realize that I should distinguish between AI-induced points of no return and other points of no return. Our timelines forecasts and takeoff speeds discussions are talking about AI, so we should interpret them as being about AI-induced points of no return. Our all-things-considered view on e.g. whether to go to grad school should be informed by AI-induced-PONR timelines and also “timelines” for things like nuclear war, pandemics, etc.]
- Two-year update on my personal AI timelines by 2 Aug 2022 23:07 UTC; 293 points) (
- Fun with +12 OOMs of Compute by 1 Mar 2021 13:30 UTC; 224 points) (
- Modern Transformers are AGI, and Human-Level by 26 Mar 2024 17:46 UTC; 219 points) (
- Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain by 18 Jan 2021 12:08 UTC; 194 points) (
- “Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity by 16 Dec 2023 20:08 UTC; 180 points) (
- Against GDP as a metric for timelines and takeoff speeds by 29 Dec 2020 17:42 UTC; 140 points) (
- Book Launch: “The Carving of Reality,” Best of LessWrong vol. III by 16 Aug 2023 23:52 UTC; 131 points) (
- Effective Altruism’s Implicit Epistemology by 18 Oct 2022 13:38 UTC; 129 points) (EA Forum;
- Voting Results for the 2020 Review by 2 Feb 2022 18:37 UTC; 108 points) (
- What convincing warning shot could help prevent extinction from AI? by 13 Apr 2024 18:09 UTC; 105 points) (
- Prospects for AI safety agreements between countries by 14 Apr 2023 17:41 UTC; 104 points) (EA Forum;
- Center on Long-Term Risk: 2021 Plans & 2020 Review by 8 Dec 2020 13:39 UTC; 87 points) (EA Forum;
- Persuasion Tools: AI takeover without AGI or agency? by 20 Nov 2020 16:54 UTC; 85 points) (
- Review of Soft Takeoff Can Still Lead to DSA by 10 Jan 2021 18:10 UTC; 79 points) (
- 2020 Review Article by 14 Jan 2022 4:58 UTC; 74 points) (
- Early-warning Forecasting Center: What it is, and why it’d be cool by 14 Mar 2022 19:20 UTC; 62 points) (EA Forum;
- Replacement for PONR concept by 2 Sep 2022 0:09 UTC; 58 points) (
- “Risk Awareness Moments” (Rams): A concept for thinking about AI governance interventions by 14 Apr 2023 17:40 UTC; 53 points) (EA Forum;
- Against GDP as a metric for timelines and takeoff speeds by 29 Dec 2020 17:50 UTC; 47 points) (EA Forum;
- What posts do you want written? by 19 Oct 2020 3:00 UTC; 47 points) (
- Focusing your impact on short vs long TAI timelines by 30 Sep 2023 19:23 UTC; 44 points) (EA Forum;
- How can I bet on short timelines? by 7 Nov 2020 12:44 UTC; 43 points) (
- Brain-Computer Interfaces and AI Alignment by 28 Aug 2021 19:48 UTC; 35 points) (
- How can I bet on short timelines? by 7 Nov 2020 12:45 UTC; 33 points) (EA Forum;
- Operationalizing timelines by 10 Mar 2023 17:30 UTC; 30 points) (EA Forum;
- Should we postpone AGI until we reach safety? by 18 Nov 2020 15:43 UTC; 27 points) (
- How Roodman’s GWP model translates to TAI timelines by 16 Nov 2020 14:11 UTC; 22 points) (EA Forum;
- How Roodman’s GWP model translates to TAI timelines by 16 Nov 2020 14:05 UTC; 22 points) (
- What considerations influence whether I have more influence over short or long timelines? by 5 Nov 2020 19:56 UTC; 21 points) (
- [AN #123]: Inferring what is valuable in order to align recommender systems by 28 Oct 2020 17:00 UTC; 20 points) (
- Is this a good way to bet on short timelines? by 28 Nov 2020 14:31 UTC; 17 points) (EA Forum;
- Persuasion Tools: AI takeover without AGI or agency? by 20 Nov 2020 16:56 UTC; 15 points) (EA Forum;
- Transformative AI and Scenario Planning for AI X-risk by 22 Mar 2024 9:38 UTC; 15 points) (
- Replacement for PONR concept by 2 Sep 2022 0:38 UTC; 14 points) (EA Forum;
- Transformative AI and Scenario Planning for AI X-risk by 22 Mar 2024 11:44 UTC; 14 points) (EA Forum;
- Poll: Which variables are most strategically relevant? by 22 Jan 2021 17:17 UTC; 14 points) (
- How long will reaching a Risk Awareness Moment and CHARTS agreement take? by 6 Sep 2023 16:39 UTC; 12 points) (EA Forum;
- 25 Feb 2022 7:49 UTC; 12 points) 's comment on Christiano and Yudkowsky on AI predictions and human intelligence by (
- 28 Oct 2020 19:33 UTC; 7 points) 's comment on Security Mindset and Takeoff Speeds by (
- Operationalizing timelines by 10 Mar 2023 16:30 UTC; 7 points) (
- 12 Aug 2021 16:54 UTC; 7 points) 's comment on Persuasion Tools: AI takeover without AGI or agency? by (
- 19 Nov 2021 10:34 UTC; 5 points) 's comment on “Biological anchors” is about bounding, not pinpointing, AI timelines by (EA Forum;
- 17 Dec 2023 5:33 UTC; 5 points) 's comment on “Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity by (
- Focusing your impact on short vs long TAI timelines by 30 Sep 2023 19:34 UTC; 4 points) (
- 18 Jan 2022 21:15 UTC; 3 points) 's comment on “Biological anchors” is about bounding, not pinpointing, AI timelines by (EA Forum;
- 17 Dec 2021 16:10 UTC; 3 points) 's comment on My Overview of the AI Alignment Landscape: A Bird’s Eye View by (EA Forum;
- 8 Dec 2020 20:26 UTC; 2 points) 's comment on Cultural accumulation by (
- 31 Aug 2023 19:55 UTC; 2 points) 's comment on AI #27: Portents of Gemini by (
- 5 Dec 2021 10:36 UTC; 2 points) 's comment on Biology-Inspired AGI Timelines: The Trick That Never Works by (
- 15 Jul 2021 7:41 UTC; 2 points) 's comment on What will the twenties look like if AGI is 30 years away? by (
- 16 Mar 2021 8:44 UTC; 1 point) 's comment on AI x-risk reduction: why I chose academia over industry by (
This post is making a valid point (the time to intervene to prevent an outcome that would otherwise occur, is going to be before the outcome actually occurs), but I’m annoyed with the mind projection fallacy by which this post seems to treat “point of no return” as a feature of the territory, rather than your planning algorithm’s map.
(And, incidentally, I wish this dumb robot cult still had a culture that cared about appreciating cognitive algorithms as the common interest of many causes, such that people would find it more natural to write a post about “point of no return”-reasoning as a general rationality topic that could have all sorts of potential applications, rather than the topic specifically being about the special case of the coming robot apocalypse. But it’s probably not fair to blame Kokotajlo for this.)
The concept of a “point of no return” only makes sense relative to a class of interventions. A 1 kg ball is falling at 9.8 m/s². When is the “point of no return” at which the ball has accelerated enough such that it’s no longer possible to stop it from hitting the ground?
The problem is underspecified as stated. If we add the additional information that your means of intervening is a net that can only trap objects falling with less than X kg⋅m/s² of force, then we can say that the point of no return happens at X/9.8 seconds. But it would be weird to talk about “the second we ball risk reducers lose the ability to significantly reduce the risk of the ball hitting the ground” as if that were an independent pre-existing fact that we could use to determine how strong of a net we need to buy, because it depends on the net strength.
Thanks! I think I agree with everything you say here except that I’m not annoyed. (Had I been annoyed by my own writing, I would have rewritten it...) Perhaps I’m not annoyed because while my post may have given the misleading impression that PONR was an objective fact about the world rather than a fact about the map of some agent or group of agents, I didn’t fall for that fallacy myself.
To be fair to my original post though, I did make it clear that the PONR is relative to a “we,” a group of people (or even a single person) with some amount of current influence over the future that could diminish to drastically less influence depending on how events go.
I reach for this “bad writing” excuse sometimes, and sometimes it’s plausible, but in general, I’m wary of the impulse to tell critics after the fact, “I agree, but I wasn’t making that mistake,” because I usually expect that if I had a deep (rather than halting, fragmentary, or inconsistent) understanding of the thing that the critic was pointing at, I would have anticipated the criticism in advance and produced different text that didn’t provide the critic with the opportunity, such that I could point to a particular sentence and tell the would-be critic, “Didn’t I already adequately address this here?”
Doesn’t the first sentence
address this by explaining PONR as our ability to do something?
(I mean I agree that finding oneself reaching for a bad writing excuse is a good clue that there’s something you can clarify for yourself further; just, this post doesn’t seem like a case of that.)
(Thanks for this—it’s important that critiques get counter-critiqued, and I think that process is stronger when third parties are involved, rather than it just being author vs. critic.)
The reason that doesn’t satisfy me is because I expect the actual calculus of “influence” and “control” in real-world settings to be sufficiently complicated that there’s probably not going to be any usefully identifiable “point of no return”. On the contrary, if there were an identifiable PONR as a natural abstraction, I think that would be a surprising empirical fact about the world in demand of deeper explanation—that the underlying calculus of influence would just happen to simplify that way, such that you could point to an event and say, “There—that’s when it all went wrong”, rather than there just being (say) a continuum of increasingly detailed possible causal graphs that you can compute counterfactuals with respect to (with more detailed graphs being more expensive to learn but granting more advanced planning capabilities).
If you’re pessimistic about alignment—and especially if you have short timelines like Daniel—I think most of your point-of-no-return-ness should already be in the past. When, specifically? I don’t see any reason to expect there to be a simple answer. You lost some measure when OpenAI launched; you lost some measure when Norbert Weiner didn’t drop everything to work on the alignment problem in 1960; you lost some measure when Samuel Butler and Charles Babbage turned out to not be the same person in our timeline; you lost some measure when the ancient Greeks didn’t discover natural selection …
The post does have a paragraph mentioning continuous loss of influence and already-lost influence in the past (“Of course, influence over the future might not disappear all on one day …”), but the reason this doesn’t satisfy me as a critic is because it seems to be treated as an afterthought (“We should keep these possibilities in mind as well”), rather than being the underlying reality to which any putative “PONR” would be a mere approximation. Instead, the rhetorical emphasis is on PONR as if it were an event: “The Date of AI Takeover Is Not the Day the AI Takes Over”. (And elsewhere, Daniel writes about “PONR-inducing tasks”.)
But in my philosophy, “the date” and “the day” of the title are two very different kinds of entities that are hard to talk about in the same sentence. The day AI takes over actually is a physical event that happens on some specific, definite date: nanobots disassemble the Earth, or whatever. That’s not subjective; the AI historian-subprocess of the future will record a definitive timestamp of when it happened. In contrast, “the date” of PONR is massively “subjective” depending on further assumptions; the AI historian-subprocesses of the future will record some sort of summary of the decision-relevant results of a billion billion ancestor simulations, but the answer is not going to fit in a 64-bit timestamp.
Maybe to Daniel, this just looks like weirdly unmotivated nitpicking (“not super relevant to the point [he] was trying to make”)? But it feels like a substantive worldview difference to me.
I’ve read this twice and I’m still not sure whether I actually get your critique. My guess is you’re saying something like:
I agree that the OP’s talking of PONR as a point in time doesn’t make sense; a charitable read is that it’s a toy model that’s supposed to help make it more clear what the difference is between our ability to prevent X and X actually happening (like in the movie Armageddon; did we nuke the asteroid soon enough for it to miss Earth vs. has the asteroid actually impacted Earth). I agree that asking about “our planning capability” is vague and gives different answers depending on what counterfactuals you’re using; in an extreme case of “what could we feasibly do”, there’s basically no PONR because we always “could” just sit down at a computer and type in a highly speed-prior-compressed source code of an FAI.
It won’t be a timestamp, but it will contain information about humans’s ability to plan. To extract useful lessons from its experience with coming into power surrounded by potentially hostile weak AGIs, a superintelligence has to compare its own developing models across time. It went from not understanding its situation and not knowing what to do to take control from the humans, to yes understanding and knowing, and along the way it was relevantly uncertain about what the humans were able to do.
Anyway, the above feels like it’s sort of skew to the thrust of the OP, which I think is: “notice that your feasible influence will decrease well before the AGI actually kills you with nanobots, so planning under a contrary assumption will produce nonsensical plans”. Maybe I’m just saying, yes it’s subjective how much we’re doomed at a given point, and yes we want our reasoning to be in a sense grounded in stuff actually happening, but also in order to usefully model in more detail what’s happening and what plans will work, we have to talk about stuff that’s intermediate in time and in abstraction between the nanobot end of the word, and the here-and-now. The intermediate stuff then says more specific stuff about when and how much influence you’re losing or gaining.
I don’t think we disagree about anything substantive, and I don’t expect Daniel to disagree about anything substantive after reading this. It’s just—
I don’t think we should be doing charitable readings at yearly review time! If an author uses a toy model to clarify something, we want the post to say “As a clarifying toy model [...]” rather than making the readers figure it out.
I unfortunately was not clear about this, but I meant to define it in such a way that this is false by definition—“loss of influence” is defined relative to the amount of influence we currently have. So even if we had a lot more influence 5 years ago, the PONR is when what little influence we have left mostly dries up. :)
If by some chance this post does make it to further stages of the review, I will heavily edit it, and I’m happy to e.g. add in “As a clarifying toy model...” among other changes.
Perhaps I should clarify then that I don’t actually think my writing was bad. I don’t think it was perfect, but I don’t think the post would have been significantly improved by me having a paragraph or two about how influence (and thus point-of-no-return) is a property of the map, not the territory. I think most readers, like me, knew that already. At any rate it seems not super relevant to the point I was trying to make.