If you can build one aligned superintelligence, then plausibly you can
explain to other AGI developers how to make theirs safe or even just give them a safe design (maybe homomorphically encrypted to prevent modification, but they might not trust that), and
have aligned AGI monitoring the internet and computing resources, and alert authorities of abnomalies that might signal new AGI developments. Require that AGI developments provide proof that they were designed according to one of a set of approved designs, or pass some tests determined by your aligned superintelligence.
Then aligned AGI can proliferate first and unaligned AGI will plausibly face severe barriers.
Plausibly 1 is enough, since there’s enough individual incentive to build something safe or copy other people’s designs and save work. 2 depends on cooperation with authorities and I’d guess cloud computing service providers or policy makers.
explain to other AGI developers how to make theirs safe or even just give them a safe design (maybe homomorphically encrypted to prevent modification, but they might not trust that)
What if the next would-be AGI developer rejects your “explanation”, and has their own great ideas for how to make an even better next-gen AGI that they claim will work better, and so they discard your “gift” and proceed with their own research effort?
I can think of at least two leaders of would-be AGI development efforts (namely Yann LeCun of Meta and Jeff Hawkins of Numenta) who believe (what I consider to be) spectacularly stupid things about AGI x-risk, and have believed those things consistently for decades, despite extensive exposure to good counter-arguments.
Or what if the next would-be AGI developer agrees with you and accepts your “gift”, and so does the one after that, and the one after that, but not the twelfth one?
have aligned AGI monitoring the internet and computing resources, and alert authorities of [anomalies] that might signal new AGI developments. Require that AGI developments provide proof that they were designed according to one of a set of approved designs, or pass some tests determined by your aligned superintelligence.
What if the authorities don’t care? What if the authorities in most countries do care, but not the authorities in every single country? (For example, I’d be surprised if Russia would act on friendly advice from USA politicians to go arrest programmers and shut down companies.)
What if the only way to “monitor the internet and computing resources” is to hack into every data center and compute cluster on the planet? (Including those in secret military labs.) That’s very not legal, and very not in the Overton window, right? Can you really imagine DeepMind management approving their aligned AGI engaging in those activities? I find that hard to imagine.
When you ask “what if”, are you implying these things are basically inevitable? And inevitable no matter how much more compute aligned AGIs have before unaligned AGIs are developed and deployed? How much of a disadvantage against aligned AGIs does an unaligned AGI need before doom isn’t overwhelmingly likely? What’s the goal post here for survival probability?
You can have AGIs monitoring for pathogens, nanotechnology, other weapons, and building defenses against them, and this could be done locally and legally. They can monitor transactions and access to websites through which dangerous physical systems (including possibly factories, labs, etc.) could be taken over or built. Does every country need to be competent and compliant to protect just one country from doom?
The Overton window could also shift dramatically if omnicidal weapons are detected.
I agree that plausibly not every country with significant compute will comply, and hacking everyone is outside the public Overton window. I wouldn’t put hacking everyone past the NSA, but also wouldn’t count on them either.
When you ask “what if”, are you implying these things are basically inevitable?
Let’s see, I think “What if the next would-be AGI developer rejects your “explanation” / “gift”” has a probability that asymptotes to 100% as the number of would-be AGI developers increases. (Hence “Claim B” above becomes relevant.) I think “What if the authorities in most countries do care, but not the authorities in every single country?” seems to have high probability in today’s world, although of course I endorse efforts to lower the probability. I think “What if the only way to “monitor the internet and computing resources” is to hack into every data center and compute cluster on the planet? (Including those in secret military labs.)” seems very likely to me, conditional on “Claim B” above.
You can have AGIs monitoring for pathogens, nanotechnology, other weapons, and building defenses against them, and this could be done locally and legally.
Hmm.
Offense-defense balance in bio-warfare is not obvious to me. Preventing a virus from being created would seem to require 100% compliance by capable labs, but I’m not sure how many “capable labs” there are, or how geographically distributed and rule-following. Once the virus starts spreading, aligned AGIs could help with vaccines, but apparently a working COVID-19 vaccine was created in 1 day, and that didn’t help much, for various societal coordination & governance reasons. So then you can say “Maybe aligned AGI will solve all societal coordination and governance problems”. And maybe it will! Or, maybe some of those coordination & governance problems come from blame-avoidance and conflicts-of-interest and status-signaling and principle-agent problems and other things that are not obviously solvable by easy access to raw intelligence. I don’t know.
Offense-defense balance in nuclear warfare is likewise not obvious to me. I presume that an unaligned AGI could find a way to manipulate nuclear early warning systems (trick them, hack into them, bribe or threaten their operators, etc.) to trigger all-out nuclear war, after hacking into a data center in New Zealand that wouldn’t be damaged. An aligned AGI playing defense would need to protect against these vulnerabilities. I guess the bad scenario that immediately jumps into my mind is that aligned AGI is not ubiquitous in Russia, such that there are still bribe-able / trickable individuals working at radar stations in Siberia, and/or that military people in some or all countries don’t trust the aligned AGI enough to let it anywhere near the nuclear weapons complex.
Offense-defense balance in gray goo seems very difficult for me to speculate about. (Assuming gray goo is even possible.) I don’t have any expertise here, but I would assume that the only way to protect against gray goo (other than prevent it from being created) is to make your own nanobots that spread around the environment, which seems like a thing that humans plausibly wouldn’t actually agree to do, even if it was technically possible and an AGI was whispering in their ear that there was no better alternative. Preventing gray goo from being created would (I presume) require 100% compliance by “capable labs”, and as above I’m not sure what “capable labs” actually look like, how hard they are they are to create, what countries they’re in, etc.
To be clear, I feel much less strongly about “Pivotal act is definitely necessary”, and much more strongly that this is something where we need to figure out the right answer and make it common knowledge. So I appreciate this pushback!! :-) :-)
With nanotech, I think there will be tradeoffs between targeting effectiveness and requiring (EM) signals from computers that can be effectively interferred with through things within or closer to the Overton window. Maybe a crux is how good autonomous nanotech with no remote control would be at targeting humans or spreading so much that it just gets into almost all buildings or food or water because it’s basically going everywhere.
I wasn’t assuming the infectious diseases and nukes by themselves would kill us all. They don’t have to, because the AGI can do other things in conjunction, like take command of military drones and mow down the survivors (or bomb the PPE factories), or cause extended large-scale blackouts, which would incidentally indirectly prevent PPE production and distribution, along with preventing pretty much every other aspect of an organized anti-pandemic response.
So that brings us to the topic of offense-defense balance for illicitly taking control of military drones. And I would feel concerned about substantial delays before the military trusts a supposedly-aligned AGI so much that they give it root access to all its computer systems (which in turn seems necessary if the aligned AGI is going to be able to patch all the security holes, defend against spear-phishing attacks, etc.) Of course there’s the usual caveat that maybe DeepMind will give their corrigible aligned AGI permission to hack into military systems (for their own good!), and then maybe we wouldn’t have to worry. But the whole point of this discussion is that I’m skeptical that DeepMind would actually give their AGI permission to do something like that.
And likewise we would need to talk about offense-defense balance for the power grid. And I would have the same concern about people being unwilling to give a supposedly-aligned AGI root access to all the power grid computers. And I would also be concerned about other power grid vulnerabilities like nuclear EMPs, drone attacks on key infrastructure, etc.
And likewise, what’s the offense-defense balance for mass targeted disinformation campaigns? Well, if DeepMind gives its AGI permission to engage in a mass targeted counter-disinformation campaign, maybe we’d be OK on that front. But that’s a big “if”!
…And probably dozens of other things like that.
Maybe a crux is how good autonomous nanotech with no remote control would be at targeting humans or spreading so much that it just gets into almost all buildings or food or water because it’s basically going everywhere.
Seems like a good question, and maybe difficult to resolve. Or maybe I would have an opinion if I ever got around to reading Eric Drexler’s books etc. :)
I think there would be too many survivors and enough manned defense capability for existing drones to directly kill the rest of us with high probability. Blocking PPE production and organized pandemic responses still won’t stop people from self-isolating, doing no contact food deliveries, etc., although things would be tough, and deliveries and food production would be good targets for drone strikes. It could be bad if lethal pathogens become widespread and practically unremovable in our food/water, or if food production is otherwise consistently attacked, but the militaries would probably step in to protect the food/water supplies.
I think, overall, there are too few ways to reliably and kill double or even single digit percentages of the human population with high probability and that can be combined to get basically everyone with high probability. I’m not saying there aren’t any, but I’m skeptical that there are enough. There are diminishing returns on doing the same ones (like pandemics) more, because of resistance, and enough people being personally very careful or otherwise difficult targets.
If you can build one aligned superintelligence, then plausibly you can
explain to other AGI developers how to make theirs safe or even just give them a safe design (maybe homomorphically encrypted to prevent modification, but they might not trust that), and
have aligned AGI monitoring the internet and computing resources, and alert authorities of abnomalies that might signal new AGI developments. Require that AGI developments provide proof that they were designed according to one of a set of approved designs, or pass some tests determined by your aligned superintelligence.
Then aligned AGI can proliferate first and unaligned AGI will plausibly face severe barriers.
Plausibly 1 is enough, since there’s enough individual incentive to build something safe or copy other people’s designs and save work. 2 depends on cooperation with authorities and I’d guess cloud computing service providers or policy makers.
What if the next would-be AGI developer rejects your “explanation”, and has their own great ideas for how to make an even better next-gen AGI that they claim will work better, and so they discard your “gift” and proceed with their own research effort?
I can think of at least two leaders of would-be AGI development efforts (namely Yann LeCun of Meta and Jeff Hawkins of Numenta) who believe (what I consider to be) spectacularly stupid things about AGI x-risk, and have believed those things consistently for decades, despite extensive exposure to good counter-arguments.
Or what if the next would-be AGI developer agrees with you and accepts your “gift”, and so does the one after that, and the one after that, but not the twelfth one?
What if the authorities don’t care? What if the authorities in most countries do care, but not the authorities in every single country? (For example, I’d be surprised if Russia would act on friendly advice from USA politicians to go arrest programmers and shut down companies.)
What if the only way to “monitor the internet and computing resources” is to hack into every data center and compute cluster on the planet? (Including those in secret military labs.) That’s very not legal, and very not in the Overton window, right? Can you really imagine DeepMind management approving their aligned AGI engaging in those activities? I find that hard to imagine.
When you ask “what if”, are you implying these things are basically inevitable? And inevitable no matter how much more compute aligned AGIs have before unaligned AGIs are developed and deployed? How much of a disadvantage against aligned AGIs does an unaligned AGI need before doom isn’t overwhelmingly likely? What’s the goal post here for survival probability?
You can have AGIs monitoring for pathogens, nanotechnology, other weapons, and building defenses against them, and this could be done locally and legally. They can monitor transactions and access to websites through which dangerous physical systems (including possibly factories, labs, etc.) could be taken over or built. Does every country need to be competent and compliant to protect just one country from doom?
The Overton window could also shift dramatically if omnicidal weapons are detected.
I agree that plausibly not every country with significant compute will comply, and hacking everyone is outside the public Overton window. I wouldn’t put hacking everyone past the NSA, but also wouldn’t count on them either.
Let’s see, I think “What if the next would-be AGI developer rejects your “explanation” / “gift”” has a probability that asymptotes to 100% as the number of would-be AGI developers increases. (Hence “Claim B” above becomes relevant.) I think “What if the authorities in most countries do care, but not the authorities in every single country?” seems to have high probability in today’s world, although of course I endorse efforts to lower the probability. I think “What if the only way to “monitor the internet and computing resources” is to hack into every data center and compute cluster on the planet? (Including those in secret military labs.)” seems very likely to me, conditional on “Claim B” above.
Hmm.
Offense-defense balance in bio-warfare is not obvious to me. Preventing a virus from being created would seem to require 100% compliance by capable labs, but I’m not sure how many “capable labs” there are, or how geographically distributed and rule-following. Once the virus starts spreading, aligned AGIs could help with vaccines, but apparently a working COVID-19 vaccine was created in 1 day, and that didn’t help much, for various societal coordination & governance reasons. So then you can say “Maybe aligned AGI will solve all societal coordination and governance problems”. And maybe it will! Or, maybe some of those coordination & governance problems come from blame-avoidance and conflicts-of-interest and status-signaling and principle-agent problems and other things that are not obviously solvable by easy access to raw intelligence. I don’t know.
Offense-defense balance in nuclear warfare is likewise not obvious to me. I presume that an unaligned AGI could find a way to manipulate nuclear early warning systems (trick them, hack into them, bribe or threaten their operators, etc.) to trigger all-out nuclear war, after hacking into a data center in New Zealand that wouldn’t be damaged. An aligned AGI playing defense would need to protect against these vulnerabilities. I guess the bad scenario that immediately jumps into my mind is that aligned AGI is not ubiquitous in Russia, such that there are still bribe-able / trickable individuals working at radar stations in Siberia, and/or that military people in some or all countries don’t trust the aligned AGI enough to let it anywhere near the nuclear weapons complex.
Offense-defense balance in gray goo seems very difficult for me to speculate about. (Assuming gray goo is even possible.) I don’t have any expertise here, but I would assume that the only way to protect against gray goo (other than prevent it from being created) is to make your own nanobots that spread around the environment, which seems like a thing that humans plausibly wouldn’t actually agree to do, even if it was technically possible and an AGI was whispering in their ear that there was no better alternative. Preventing gray goo from being created would (I presume) require 100% compliance by “capable labs”, and as above I’m not sure what “capable labs” actually look like, how hard they are they are to create, what countries they’re in, etc.
To be clear, I feel much less strongly about “Pivotal act is definitely necessary”, and much more strongly that this is something where we need to figure out the right answer and make it common knowledge. So I appreciate this pushback!! :-) :-)
Some more skepticism about infectious diseases and nukes killing us all here: https://www.lesswrong.com/posts/MLKmxZgtLYRH73um3/we-will-be-around-in-30-years?commentId=DJygArj3sj8cmhmme
Also my more general skeptical take against non-nano attacks here: https://www.lesswrong.com/posts/MLKmxZgtLYRH73um3/we-will-be-around-in-30-years?commentId=TH4hGeXS4RLkkuNy5
With nanotech, I think there will be tradeoffs between targeting effectiveness and requiring (EM) signals from computers that can be effectively interferred with through things within or closer to the Overton window. Maybe a crux is how good autonomous nanotech with no remote control would be at targeting humans or spreading so much that it just gets into almost all buildings or food or water because it’s basically going everywhere.
Thanks!
I wasn’t assuming the infectious diseases and nukes by themselves would kill us all. They don’t have to, because the AGI can do other things in conjunction, like take command of military drones and mow down the survivors (or bomb the PPE factories), or cause extended large-scale blackouts, which would incidentally indirectly prevent PPE production and distribution, along with preventing pretty much every other aspect of an organized anti-pandemic response.
See Section 1.6 here.
So that brings us to the topic of offense-defense balance for illicitly taking control of military drones. And I would feel concerned about substantial delays before the military trusts a supposedly-aligned AGI so much that they give it root access to all its computer systems (which in turn seems necessary if the aligned AGI is going to be able to patch all the security holes, defend against spear-phishing attacks, etc.) Of course there’s the usual caveat that maybe DeepMind will give their corrigible aligned AGI permission to hack into military systems (for their own good!), and then maybe we wouldn’t have to worry. But the whole point of this discussion is that I’m skeptical that DeepMind would actually give their AGI permission to do something like that.
And likewise we would need to talk about offense-defense balance for the power grid. And I would have the same concern about people being unwilling to give a supposedly-aligned AGI root access to all the power grid computers. And I would also be concerned about other power grid vulnerabilities like nuclear EMPs, drone attacks on key infrastructure, etc.
And likewise, what’s the offense-defense balance for mass targeted disinformation campaigns? Well, if DeepMind gives its AGI permission to engage in a mass targeted counter-disinformation campaign, maybe we’d be OK on that front. But that’s a big “if”!
…And probably dozens of other things like that.
Seems like a good question, and maybe difficult to resolve. Or maybe I would have an opinion if I ever got around to reading Eric Drexler’s books etc. :)
I think there would be too many survivors and enough manned defense capability for existing drones to directly kill the rest of us with high probability. Blocking PPE production and organized pandemic responses still won’t stop people from self-isolating, doing no contact food deliveries, etc., although things would be tough, and deliveries and food production would be good targets for drone strikes. It could be bad if lethal pathogens become widespread and practically unremovable in our food/water, or if food production is otherwise consistently attacked, but the militaries would probably step in to protect the food/water supplies.
I think, overall, there are too few ways to reliably and kill double or even single digit percentages of the human population with high probability and that can be combined to get basically everyone with high probability. I’m not saying there aren’t any, but I’m skeptical that there are enough. There are diminishing returns on doing the same ones (like pandemics) more, because of resistance, and enough people being personally very careful or otherwise difficult targets.