Please describe or provide links to descriptions of concrete AGI takeover scenarios that are at least semi-plausible, and especially takeover scenarios that result in human extermination and/or eternal suffering (s-risk). Yes, I know that the arguments don’t necessarily require that we can describe particular takeover scenarios, but I still find it extremely useful to have concrete scenarios available, both for thinking purposes and for explaining things to others.
Without nanotech or anything like that, maybe the easiest way is to manipulate humans into building lots of powerful and hackable weapons (or just wait since we’re doing it anyway). Then one day, strike.
Edit: and of course the AI’s first action will be to covertly take over the internet, because the biggest danger to the AI is another AI already existing or being about to appear. It’s worth taking a small risk of being detected by humans to prevent the bigger risk of being outraced by a competitor.
I find slower take-off scenarios more plausible. I like the general thrust of Christiano’s “What failure looks like”. I wonder if anyone has written up a more narrative / concrete account of that sort of scenario.
The concrete example I usually use here is nanotech, because there’s been pretty detailed analysis of what definitely look like physically attainable lower bounds on what should be possible with nanotech, and those lower bounds are sufficient to carry the point. My lower-bound model of “how a sufficiently powerful intelligence would kill everyone, if it didn’t want to not do that” is that it gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they’re dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery. (Back when I was first deploying this visualization, the wise-sounding critics said “Ah, but how do you know even a superintelligence could solve the protein folding problem, if it didn’t already have planet-sized supercomputers?” but one hears less of this after the advent of AlphaFold 2, for some odd reason.) The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth’s atmosphere, get into human bloodstreams and hide, strike on a timer. Losing a conflict with a high-powered cognitive system looks at least as deadly as “everybody on the face of the Earth suddenly falls over dead within the same second”.
https://www.gwern.net/fiction/Clippy (very detailed but also very long and very full of technical jargon; on the other hand, I think it’s mostly understandable even if you have to gloss over most of the jargon)
Please describe or provide links to descriptions of concrete AGI takeover scenarios that are at least semi-plausible, and especially takeover scenarios that result in human extermination and/or eternal suffering (s-risk). Yes, I know that the arguments don’t necessarily require that we can describe particular takeover scenarios, but I still find it extremely useful to have concrete scenarios available, both for thinking purposes and for explaining things to others.
Without nanotech or anything like that, maybe the easiest way is to manipulate humans into building lots of powerful and hackable weapons (or just wait since we’re doing it anyway). Then one day, strike.
Edit: and of course the AI’s first action will be to covertly take over the internet, because the biggest danger to the AI is another AI already existing or being about to appear. It’s worth taking a small risk of being detected by humans to prevent the bigger risk of being outraced by a competitor.
This new series of posts from Holden Karnofsky (CEO of Open Philanthropy) is about exactly this. The first post came out today:
https://www.lesswrong.com/posts/oBBzqkZwkxDvsKBGB/ai-could-defeat-all-of-us-combined
I find slower take-off scenarios more plausible. I like the general thrust of Christiano’s “What failure looks like”. I wonder if anyone has written up a more narrative / concrete account of that sort of scenario.
https://www.lesswrong.com/posts/XFBHXu4YNqyF6R3cv/pitching-an-alignment-softball
Alexey Turchin and David Denkenberger describe several scenarios here: https://philpapers.org/rec/TURCOG-2 (additional recent discussion in this comment thread)
Eliezer’s go-to scenario (from his recent post):
https://www.lesswrong.com/posts/BAzCGCys4BkzGDCWR/the-prototypical-catastrophic-ai-action-is-getting-root
https://www.gwern.net/fiction/Clippy (very detailed but also very long and very full of technical jargon; on the other hand, I think it’s mostly understandable even if you have to gloss over most of the jargon)