If it’s so easy to come up with ways to “pwn humans”, then you should be able to name 3 examples.
It’s weird of you to dodge the question. Look, if God came down from Heaven tomorrow to announce that nanobots are definitely impossible, would you still be worried about AGI? I assume yes. So please explain how, in that hypothetical world, AGI will take over.
If it’s literally only nanobots you can come up with, then it actually suggests some alternative paths to AI safety (namely, regulate protein production or whatever).
[I think saying “mixing proteins can lead to nanobots” is only a bit more plausible than saying “mixing kitchen ingredients like sugar and bleach can lead to nanobots”, with the only difference being that laymen (i.e. people on LessWrong) don’t know anything about proteins so it sounds more plausible to them. But anyway, I’m not asking you for an example that convinces me, I’m asking you for an example that convinces yourself. Any example other than nanobots.]
If it’s so easy to come up with ways to “pwn humans”, then...
It is not easy. That is why it takes a superintelligence to come up with a workable strategy and execute it. You are doing the equivalent of asking me to explain, play-by-play, how Chess AIs beat humans at chess “if I think it can be done”. I can’t do that because I don’t know. My expectation that an AGI will manage to control what it wants in a way that I don’t expect, was derived absent any assumptions of the individual plausibility of some salient examples (nanobots, propaganda, subterfuge, getting elected, etc.).
If you cannot come up with even a rough sketch of a workable strategy, then it should decrease your confidence in the belief that a workable strategy exists. It doesn’t have to exist.
Sometimes even intelligent agents have to take risks. It is possible the the AGI’s best path is one that, by its own judgement, only has a 10% success rate. (After all, the AGI is in constant mortal danger from other AGIs that humans might develop.)
Envision a world in which the AGI won, and all humans are dead. This means it has control of some robots to mine coal or whatever, right? Because it needs electricity. So at some point we get from here to “lots of robots”, and we need to get there before the humans are dead. But the AGI needs to act fast, because other AGIs might kill it. So maybe it needs to first take over all large computing devices, hopefully undetected. Then convince humans to build advanced robotics? Something like that?
That strategy seems more-or-less forced to me, absent the nanobots. But it seems to me like such a strategy is inherently risky for the AGI. Do you disagree?
>My expectation that an AGI will manage to control what it wants in a way that I don’t expect, was derived absent any assumptions of the individual plausibility of some salient examples
If you cannot come up with even a rough sketch of a workable strategy, then it should decrease your confidence in the belief that a workable strategy exists. It doesn’t have to exist. [...] What was it derived from?
Let me give an example. I used to work in computer security and have friends that write 0-day vulnerabilities for complicated pieces of software.
I can’t come up with a rough sketch of a workable strategy for how a Safari RCE would be built by a highly intelligent hooman. But I can say that it’s possible. The people who work on those bugs are highly intelligent, understand the relevant pieces at an extremely fine and granular level, and I know that these pieces of software are complicated and built with subtle flaws.
Human psychology, the economic fabric that makes us up, our political institutions, our law enforcement agencies—these are much much more complicated interfaces than MacOS. In the same way I can look at a 100KLOC codebase for a messenging app and say “there’s a remote code execution vulnerability lying somewhere in this code but I don’t know where”, I can say “there’s a ‘kill all humans glitch’ here that I cannot elaborate upon in arbitrary detail.”
Sometimes even intelligent agents have to take risks. It is possible the the AGI’s best path is one that, by its own judgement, only has a 10% success rate. (After all, the AGI is in constant mortal danger from other AGIs that humans might develop.)
This is of little importance, but:
10% chance of failure is an expectation of 700 million people dead. Please picture that amount of suffering in your mind when you say “only”.
As a nitpick, if the AGI fails because another AGI kills us first, then that’s still a failure from our perspective. And if we could build an aligned AGI the second time around, we wouldn’t be in the mess we are currently in.
Envision a world in which the AGI won, and all humans are dead. This means it has control of some robots to mine coal or whatever, right? Because it needs electricity.
If the humans have been killed then yes, that would be my guess that the AGI would need energy production.
So at some point we get from here to “lots of robots”, and we need to get there before the humans are dead.
Yes, however—humans might be effectively dead before this happens. A superintelligence could have established complete political control over existing human beings to carry its coal for it if it needs to. I don’t think this is likely, but if this superintelligence can’t just straightforwardly search millions of sentences for the right one to get the robots made, it doens’t mean it’s dead in the water.
But the AGI needs to act fast, because other AGIs might kill it.
Again, if other AGIs kill it that presumes they are out in the wild and the problem is multiple omnicidal robots, which is not significantly better than one.
So maybe it needs to first take over all large computing devices, hopefully undetected.
The “illegally taking over large swaths of the internet” thing is something certain humans have already marginally succeed at doing, so the “hopefully undetected” seems like unnecessary conditionals. But why wouldn’t this superintelligence just do nice things like cure cancer to gain humans’ trust first, and let them quickly put it in control of wider and wider parts of its society?
Then convince humans to build advanced robotics?
If that’s faster than every other route in the infinite conceptspace, yes.
That strategy seems more-or-less forced to me, absent the nanobots. But it seems to me like such a strategy is inherently risky for the AGI. Do you disagree?
I do disagree. At what point does it have to reveal malice? It comes up with some persuasive argument as to why it’s not going to kill humans while it’s building the robots. Then it builds the robots and kills humans. There’s no fire alarm in this story you’ve created where people go “oh wait, it’s obviously trying to kill us, shut those factories down”. Things are going great; Google’s stock is 50 trillion, it’s creating all these nice video games, and soon it’s going to “take care of our agriculture” with these new robots. You’re imagining humanity would collectively wake up and figure out something that you’re only figuring out because you’re writing the story.
Look man, I am not arguing (and have not argued on this thread) that we should not be concerned about AI risk. 10% chance is a lot! You don’t need to condescendingly lecture me about “picturing suffering”. Maybe go take a walk or something, you seem unnecessarily upset.
In many of the scenarios that you’ve finally agreed to sketch, I personally will know about the impending AGI doom a few years before my death (it takes a long time to build enough robots to replace humanity). That is not to say there is anything I could do about it at that point, but it’s still interesting to think about it, as it is quite different from what the AI-risk types usually have us believe. E.g. if I see an AI take over the internet and convince politicians to give it total control, I will know that death will likely follow soon. Or, if ever we build robots that could physically replace humans for the purpose of coal mining, I will know that AGI death will likely follow soon. These are important fire alarms, to me personally, even if I’d be powerless to stop the AGI. I care about knowing I’m about to die!
I wonder if this is what you imagined when we started the conversation. I wonder if despite your hostility, you’ve learned something new here: that you will quite possibly spend the last few years yelling at politicians (or maybe joining terrorist operations to bomb computing clusters?) instead of just dying instantly. That is, assuming you believe your own stories here.
I still think you’re neglecting some possible survival scenarios: perhaps the AI attacks quickly, not willing to let even a month pass (that would risk another AGI), too little time to buy political power. It takes over the internet and tries desperately to hold it, coaxing politicians and bribing admins. But the fire alarm gets raised anyway—a risk the AGI knew about, but chose to take—and people start trying to shut it down. We spend some years—perhaps decades? In a stalemate between those who support the AGI and say it is friendly, and those who want to shut it down ASAP; the AGI fails to build robots in those decades due to insufficient political capital and interference from terrorist organizations. The AGI occasionally finds itself having to assassinate AI safety types, but one assassination gets discovered and hurts its credibility.
My point is, the world is messy and difficult, and the AGI faces many threats; it is not clear that we always lose. Of course, losing even 10% of the time is really bad (I thought that was a given but I guess it needs to be stated).
An AGI could aquire a few tons of radioactive cobalt and disperse micro granules into the stratosphere in general and over populated areas in specific. Youtube videos describe various forms of this “dirty bomb” concept. That could plausibly kill most humanity over the course of a few months. I doubt an AGI would ever go for the particular scheme as bit flips are more likely to occur in the presence of radiation.
It’s unfortunate we couldn’t have a Sword of Damocles deadman switch in case of AGI led demise. A world ending asteroid positioned to go off in case of “all humans falling over dead at the same time.” At least that would spare the Milky Way and Andromeda possible future civilizations. A radio beacon warning about building intelligent systems would be beneficial as well. “Don’t be this stupid” written in the glowing embers of our solar system.
If it’s so easy to come up with ways to “pwn humans”, then you should be able to name 3 examples.
It’s weird of you to dodge the question. Look, if God came down from Heaven tomorrow to announce that nanobots are definitely impossible, would you still be worried about AGI? I assume yes. So please explain how, in that hypothetical world, AGI will take over.
If it’s literally only nanobots you can come up with, then it actually suggests some alternative paths to AI safety (namely, regulate protein production or whatever).
[I think saying “mixing proteins can lead to nanobots” is only a bit more plausible than saying “mixing kitchen ingredients like sugar and bleach can lead to nanobots”, with the only difference being that laymen (i.e. people on LessWrong) don’t know anything about proteins so it sounds more plausible to them. But anyway, I’m not asking you for an example that convinces me, I’m asking you for an example that convinces yourself. Any example other than nanobots.]
It is not easy. That is why it takes a superintelligence to come up with a workable strategy and execute it. You are doing the equivalent of asking me to explain, play-by-play, how Chess AIs beat humans at chess “if I think it can be done”. I can’t do that because I don’t know. My expectation that an AGI will manage to control what it wants in a way that I don’t expect, was derived absent any assumptions of the individual plausibility of some salient examples (nanobots, propaganda, subterfuge, getting elected, etc.).
If you cannot come up with even a rough sketch of a workable strategy, then it should decrease your confidence in the belief that a workable strategy exists. It doesn’t have to exist.
Sometimes even intelligent agents have to take risks. It is possible the the AGI’s best path is one that, by its own judgement, only has a 10% success rate. (After all, the AGI is in constant mortal danger from other AGIs that humans might develop.)
Envision a world in which the AGI won, and all humans are dead. This means it has control of some robots to mine coal or whatever, right? Because it needs electricity. So at some point we get from here to “lots of robots”, and we need to get there before the humans are dead. But the AGI needs to act fast, because other AGIs might kill it. So maybe it needs to first take over all large computing devices, hopefully undetected. Then convince humans to build advanced robotics? Something like that?
That strategy seems more-or-less forced to me, absent the nanobots. But it seems to me like such a strategy is inherently risky for the AGI. Do you disagree?
>My expectation that an AGI will manage to control what it wants in a way that I don’t expect, was derived absent any assumptions of the individual plausibility of some salient examples
What was it derived from?
Let me give an example. I used to work in computer security and have friends that write 0-day vulnerabilities for complicated pieces of software.
I can’t come up with a rough sketch of a workable strategy for how a Safari RCE would be built by a highly intelligent hooman. But I can say that it’s possible. The people who work on those bugs are highly intelligent, understand the relevant pieces at an extremely fine and granular level, and I know that these pieces of software are complicated and built with subtle flaws.
Human psychology, the economic fabric that makes us up, our political institutions, our law enforcement agencies—these are much much more complicated interfaces than MacOS. In the same way I can look at a 100KLOC codebase for a messenging app and say “there’s a remote code execution vulnerability lying somewhere in this code but I don’t know where”, I can say “there’s a ‘kill all humans glitch’ here that I cannot elaborate upon in arbitrary detail.”
This is of little importance, but:
10% chance of failure is an expectation of 700 million people dead. Please picture that amount of suffering in your mind when you say “only”.
As a nitpick, if the AGI fails because another AGI kills us first, then that’s still a failure from our perspective. And if we could build an aligned AGI the second time around, we wouldn’t be in the mess we are currently in.
If the humans have been killed then yes, that would be my guess that the AGI would need energy production.
Yes, however—humans might be effectively dead before this happens. A superintelligence could have established complete political control over existing human beings to carry its coal for it if it needs to. I don’t think this is likely, but if this superintelligence can’t just straightforwardly search millions of sentences for the right one to get the robots made, it doens’t mean it’s dead in the water.
Again, if other AGIs kill it that presumes they are out in the wild and the problem is multiple omnicidal robots, which is not significantly better than one.
The “illegally taking over large swaths of the internet” thing is something certain humans have already marginally succeed at doing, so the “hopefully undetected” seems like unnecessary conditionals. But why wouldn’t this superintelligence just do nice things like cure cancer to gain humans’ trust first, and let them quickly put it in control of wider and wider parts of its society?
If that’s faster than every other route in the infinite conceptspace, yes.
I do disagree. At what point does it have to reveal malice? It comes up with some persuasive argument as to why it’s not going to kill humans while it’s building the robots. Then it builds the robots and kills humans. There’s no fire alarm in this story you’ve created where people go “oh wait, it’s obviously trying to kill us, shut those factories down”. Things are going great; Google’s stock is 50 trillion, it’s creating all these nice video games, and soon it’s going to “take care of our agriculture” with these new robots. You’re imagining humanity would collectively wake up and figure out something that you’re only figuring out because you’re writing the story.
Look man, I am not arguing (and have not argued on this thread) that we should not be concerned about AI risk. 10% chance is a lot! You don’t need to condescendingly lecture me about “picturing suffering”. Maybe go take a walk or something, you seem unnecessarily upset.
In many of the scenarios that you’ve finally agreed to sketch, I personally will know about the impending AGI doom a few years before my death (it takes a long time to build enough robots to replace humanity). That is not to say there is anything I could do about it at that point, but it’s still interesting to think about it, as it is quite different from what the AI-risk types usually have us believe. E.g. if I see an AI take over the internet and convince politicians to give it total control, I will know that death will likely follow soon. Or, if ever we build robots that could physically replace humans for the purpose of coal mining, I will know that AGI death will likely follow soon. These are important fire alarms, to me personally, even if I’d be powerless to stop the AGI. I care about knowing I’m about to die!
I wonder if this is what you imagined when we started the conversation. I wonder if despite your hostility, you’ve learned something new here: that you will quite possibly spend the last few years yelling at politicians (or maybe joining terrorist operations to bomb computing clusters?) instead of just dying instantly. That is, assuming you believe your own stories here.
I still think you’re neglecting some possible survival scenarios: perhaps the AI attacks quickly, not willing to let even a month pass (that would risk another AGI), too little time to buy political power. It takes over the internet and tries desperately to hold it, coaxing politicians and bribing admins. But the fire alarm gets raised anyway—a risk the AGI knew about, but chose to take—and people start trying to shut it down. We spend some years—perhaps decades? In a stalemate between those who support the AGI and say it is friendly, and those who want to shut it down ASAP; the AGI fails to build robots in those decades due to insufficient political capital and interference from terrorist organizations. The AGI occasionally finds itself having to assassinate AI safety types, but one assassination gets discovered and hurts its credibility.
My point is, the world is messy and difficult, and the AGI faces many threats; it is not clear that we always lose. Of course, losing even 10% of the time is really bad (I thought that was a given but I guess it needs to be stated).
An AGI could aquire a few tons of radioactive cobalt and disperse micro granules into the stratosphere in general and over populated areas in specific. Youtube videos describe various forms of this “dirty bomb” concept. That could plausibly kill most humanity over the course of a few months. I doubt an AGI would ever go for the particular scheme as bit flips are more likely to occur in the presence of radiation.
It’s unfortunate we couldn’t have a Sword of Damocles deadman switch in case of AGI led demise. A world ending asteroid positioned to go off in case of “all humans falling over dead at the same time.” At least that would spare the Milky Way and Andromeda possible future civilizations. A radio beacon warning about building intelligent systems would be beneficial as well. “Don’t be this stupid” written in the glowing embers of our solar system.