>The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.
But obviously these metaphors are not very apt, since humanity kinda has a massive incumbent advantage that would need to be overcome. Rome Sweet Rome is a fun story not because 21st century soldiers and Roman legionnaires are intrinsically equals but because the technologically-superior side starts is facing down a massive incumbent power.
One thing that I’ve always found a bit handwavey about the hard takeoff scenarios is that they tend to assume that a superintelligent AI would actually be able to plot out a pathway from being in a box to eliminating humanity that is basically guaranteed to succeed. These stories tend to involve the assumption that the AI will be able to invent highly-potent weapons very quickly and without risk of detection, but it seems at least pretty plausible that...… this is just too difficult. I just think it’s likely that we’ll see several failed AI takeover attempts before a success occurs, and hopefully we’ll learn something from these early problems that will slow things down.
We are going to build a powerful self-improving system, and then let it try end humanity with some p(doom)<1 (hopefully) and then do that iteratively?
My gut reaction to a plan like that looks like this “Eff you. You want to play Russian roulette, fine sure do that on your own. But leave me and everyone else out of it”
AI will be able to invent highly-potent weapons very quickly and without risk of detection, but it seems at least pretty plausible that...… this is just too difficult
You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year.
And no there is zero chance I will elaborate on any of the possible ways humanity purposefully could be wiped out.
>You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year.
Conversely, it’s possible that doomers are suffering from an overabundance of imagination here. To be a bit blunt, I don’t take it for granted that an arbitrarily smart AI would be able to manipulate a human into developing a supervirus or nanomachines in a risk-free fashion.
The fast takeoff doom scenarios seem like they should be subject to Drake equation-style analyses to determine P(doom). Even if we develop malevolent AIs, I’d say that P(doom | AGI tries to harm humans) is significantly less than 100%… obviously if humans detect this it would not necessarily prevent future incidents but I’d expect enough of a response that I don’t see how people could put P(doom) at 95% or more.
Well, as Eliezer said, today you can literally order custom DNA strings by email, as long as they don’t match anything in the “known dangerous virus” database.
And the AIs task is a little easier than you might suspect, because it doesn’t need to be able to fool everyone into doing arbitrary weird stuff, or even most people. If it can do ordinary Internet things like “buy stuff on Amazon.com”, then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.
>then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.
Yes, but do I take it for granted that an AI will be able to manipulate the human into creating a virus that will kill literally everyone on Earth, or at least a sufficient number to allow the AI to enact some secondary plans to take over the world? Without being detected? Not with anywhere near 100% probability. I just think these sorts of arguments should be subject to Drake equation-style reasonings that will dilute the likelihood of doom under most circumstances.
This isn’t an argument for being complacent. But it does allow us to push back against the idea that “we only have one shot at this.”
I mean, the human doesn’t have to know that it’s creating a doomsday virus. The AI could be promising it a cure for his daughter’s cancer, or something.
Or just promising the human some money, with the sequence of actions set up to obscure that anything important is happening. (E.g., you can use misdirection like ‘the actually important event that occurred was early in the process, when you opened a test tube to add some saline and thereby allowed the contents of the test tub to start propagating into the air; the later step where you mail the final product to an address you were given, or record an experimental result in a spreadsheet and email the spreadsheet to your funder, doesn’t actually matter for the plan’.)
Getting humans to do things is really easy, if they don’t know of a good reason not to do it. It’s sometimes called “social engineering”, and sometimes it’s called “hiring them”.
You have to weigh the conjunctive aspects of particular plans against the disjunctiveness of ‘there are many different ways to try to do this, including ways we haven’t thought of’.
To be a bit blunt, I don’t take it for granted that an arbitrarily smart AI would be able to manipulate a human into developing a supervirus or nanomachines in a risk-free fashion.
How did you reach that conclusion? What does that ontology look like?
The fast takeoff doom scenarios seem like they should be subject to Drake equation-style analyses to determine P(doom). Even if we develop malevolent AIs, I’d say that P(doom | AGI tries to harm humans) is significantly less than 100%… obviously if humans detect this it would not necessarily prevent future incidents but I’d expect enough of a response that I don’t see how people could put P(doom) at 95% or more.
What is your p(doom)? Is that acceptable? If yes, why is it acceptable? If no, what is the acceptable p(doom)?
Remember when some people, in order to see what would happen, modified a “drug discovery” AI system to search for maximally toxic molecules instead of minimizing toxicity and it ended up “inventing” molecules very similar to VX nerve gas?
>The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.
But obviously these metaphors are not very apt, since humanity kinda has a massive incumbent advantage that would need to be overcome. Rome Sweet Rome is a fun story not because 21st century soldiers and Roman legionnaires are intrinsically equals but because the technologically-superior side starts is facing down a massive incumbent power.
One thing that I’ve always found a bit handwavey about the hard takeoff scenarios is that they tend to assume that a superintelligent AI would actually be able to plot out a pathway from being in a box to eliminating humanity that is basically guaranteed to succeed. These stories tend to involve the assumption that the AI will be able to invent highly-potent weapons very quickly and without risk of detection, but it seems at least pretty plausible that...… this is just too difficult. I just think it’s likely that we’ll see several failed AI takeover attempts before a success occurs, and hopefully we’ll learn something from these early problems that will slow things down.
I just want to be clear I understand your “plan”.
We are going to build a powerful self-improving system, and then let it try end humanity with some p(doom)<1 (hopefully) and then do that iteratively?
My gut reaction to a plan like that looks like this “Eff you. You want to play Russian roulette, fine sure do that on your own. But leave me and everyone else out of it”
You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year.
And no there is zero chance I will elaborate on any of the possible ways humanity purposefully could be wiped out.
I outlined my expectations, not a “plan”.
>You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year.
Conversely, it’s possible that doomers are suffering from an overabundance of imagination here. To be a bit blunt, I don’t take it for granted that an arbitrarily smart AI would be able to manipulate a human into developing a supervirus or nanomachines in a risk-free fashion.
The fast takeoff doom scenarios seem like they should be subject to Drake equation-style analyses to determine P(doom). Even if we develop malevolent AIs, I’d say that P(doom | AGI tries to harm humans) is significantly less than 100%… obviously if humans detect this it would not necessarily prevent future incidents but I’d expect enough of a response that I don’t see how people could put P(doom) at 95% or more.
Well, as Eliezer said, today you can literally order custom DNA strings by email, as long as they don’t match anything in the “known dangerous virus” database.
And the AIs task is a little easier than you might suspect, because it doesn’t need to be able to fool everyone into doing arbitrary weird stuff, or even most people. If it can do ordinary Internet things like “buy stuff on Amazon.com”, then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.
>then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.
Yes, but do I take it for granted that an AI will be able to manipulate the human into creating a virus that will kill literally everyone on Earth, or at least a sufficient number to allow the AI to enact some secondary plans to take over the world? Without being detected? Not with anywhere near 100% probability. I just think these sorts of arguments should be subject to Drake equation-style reasonings that will dilute the likelihood of doom under most circumstances.
This isn’t an argument for being complacent. But it does allow us to push back against the idea that “we only have one shot at this.”
I mean, the human doesn’t have to know that it’s creating a doomsday virus. The AI could be promising it a cure for his daughter’s cancer, or something.
Or just promising the human some money, with the sequence of actions set up to obscure that anything important is happening. (E.g., you can use misdirection like ‘the actually important event that occurred was early in the process, when you opened a test tube to add some saline and thereby allowed the contents of the test tub to start propagating into the air; the later step where you mail the final product to an address you were given, or record an experimental result in a spreadsheet and email the spreadsheet to your funder, doesn’t actually matter for the plan’.)
Getting humans to do things is really easy, if they don’t know of a good reason not to do it. It’s sometimes called “social engineering”, and sometimes it’s called “hiring them”.
You have to weigh the conjunctive aspects of particular plans against the disjunctiveness of ‘there are many different ways to try to do this, including ways we haven’t thought of’.
How did you reach that conclusion? What does that ontology look like?
What is your p(doom)? Is that acceptable? If yes, why is it acceptable? If no, what is the acceptable p(doom)?
Remember when some people, in order to see what would happen, modified a “drug discovery” AI system to search for maximally toxic molecules instead of minimizing toxicity and it ended up “inventing” molecules very similar to VX nerve gas?