I agree we’re talking in circles, but I think you are too far into the argument to really understand the situation you have painted. To be clear:
There is an active unaligned superintelligence and we’re closely communicating with it.
We are in the middle of building nanotechnology factories designed by said AGI which will, within say, 180 days, end all life on earth.
There is no known solution to the problem of building an aligned superintelligence. If we built one, as you mention, it would not be able to build its own nanomachines or do something equivalent within 180 days, and also we are being misled by an existing unaligned superintelligence on how to do so.
There are no EA community members or MIRI-equivalents in control of a major research lab to deploy such a solution if it existed.
What percentage of the time do you think we survive such a scenario? Don’t think of me, think of the gears.
First, the factories might take years to be built.
Second, I am not even convinced that the nanobotswill kill all humans etc, but I won’t go into this because discussing the gears here can be an infohazard.
Third, you build one AGI and you ask it the instructions to make other AGIs that are aligned. If the machine does not want, you disconnect it. It cannot say no because it has no way of fighting back (nanofactories still being built), and saying yes goes in its own interest. If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
This is called “solving the alignment problem”. A scheme which will, with formal guarantees, verify whether something is a solution to the alignment problem, can be translated into a solution to the alignment problem.
The problem is not that it’s not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don’t just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn’t do every single one of those things.
EDIT: also, uh, did we just circle back to “AGI won’t kill us because we’ll solve the alignment problem before it has enough time to kill us”? That sure is pretty far away from “AGI won’t figure out a way to kill us”, which is what your original claim was.
No. You have started from a premise that is far from proven, which is, the AGI will have the capacity to kill us all and there is nothing we can do about it, and any other argument that follow is based in that premise, of which I deny its validity saying that doing that trick is hard even being very very clever.
I don’t even know what you’re trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a “lose condition”?
It depends on so many specific details that I cannot really answer that. I am arguing against the possibility of a machine that kill us all, that’s all. The nanotech example was only to show that it is absurd to think that things will be so easy as: the machine creates nanotech and then game over
I feel that’s a bit unfair, especially after all the back and forth. You suggested an argument on how a machine can try to take over the world and I argued with specific reasons why that is not that easy. If you want, we can leave it here. Thank you for the discussion in any case, I really enjoyed it.
Replying to your edit, it does not really matter to me why specifically the AGI wont kill us. I think I am not contradicting myself: I think that you can have a machine that won’t kill us because it can’t and I also think that an AGI could potentially solve the alignment problem.
I feel like you’re still coming up with arguments when you should be running the simulation. Try to predict what would actually happen if we made “another adversarial machine that checks the results”.
Let me guess. The machines coordinate themselves to mislead the human team into believing that they are aligned? If that’s it, I am really not impressed
The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.
I agree we’re talking in circles, but I think you are too far into the argument to really understand the situation you have painted. To be clear:
There is an active unaligned superintelligence and we’re closely communicating with it.
We are in the middle of building nanotechnology factories designed by said AGI which will, within say, 180 days, end all life on earth.
There is no known solution to the problem of building an aligned superintelligence. If we built one, as you mention, it would not be able to build its own nanomachines or do something equivalent within 180 days, and also we are being misled by an existing unaligned superintelligence on how to do so.
There are no EA community members or MIRI-equivalents in control of a major research lab to deploy such a solution if it existed.
What percentage of the time do you think we survive such a scenario? Don’t think of me, think of the gears.
First, the factories might take years to be built.
Second, I am not even convinced that the nanobotswill kill all humans etc, but I won’t go into this because discussing the gears here can be an infohazard.
Third, you build one AGI and you ask it the instructions to make other AGIs that are aligned. If the machine does not want, you disconnect it. It cannot say no because it has no way of fighting back (nanofactories still being built), and saying yes goes in its own interest. If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
This is called “solving the alignment problem”. A scheme which will, with formal guarantees, verify whether something is a solution to the alignment problem, can be translated into a solution to the alignment problem.
Yes, and I haven’t seen a good reason of why this is not possible.
The problem is not that it’s not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don’t just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn’t do every single one of those things.
EDIT: also, uh, did we just circle back to “AGI won’t kill us because we’ll solve the alignment problem before it has enough time to kill us”? That sure is pretty far away from “AGI won’t figure out a way to kill us”, which is what your original claim was.
No. You have started from a premise that is far from proven, which is, the AGI will have the capacity to kill us all and there is nothing we can do about it, and any other argument that follow is based in that premise, of which I deny its validity saying that doing that trick is hard even being very very clever.
I don’t even know what you’re trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a “lose condition”?
It depends on so many specific details that I cannot really answer that. I am arguing against the possibility of a machine that kill us all, that’s all. The nanotech example was only to show that it is absurd to think that things will be so easy as: the machine creates nanotech and then game over
I don’t actually see that you’ve presented an argument anywhere.
I feel that’s a bit unfair, especially after all the back and forth. You suggested an argument on how a machine can try to take over the world and I argued with specific reasons why that is not that easy. If you want, we can leave it here. Thank you for the discussion in any case, I really enjoyed it.
Replying to your edit, it does not really matter to me why specifically the AGI wont kill us. I think I am not contradicting myself: I think that you can have a machine that won’t kill us because it can’t and I also think that an AGI could potentially solve the alignment problem.
I feel like you’re still coming up with arguments when you should be running the simulation. Try to predict what would actually happen if we made “another adversarial machine that checks the results”.
Let me guess. The machines coordinate themselves to mislead the human team into believing that they are aligned? If that’s it, I am really not impressed
The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.