“Because the EA community does not control the major research labs”
Fine, replace that by any lab that cares about AGI. Are you telling me that you can’t imagine a lab that worries about this and tries to solve the alignment problem? Is that really harder to imagine than a superelaborate plan to kill all humans?
“Doesn’t know how to use a misaligned AGI safely to do that.”
We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn’t it?
“It’s not that we won’t try. It’s that we’re unable”
Is there the equivalent to a Godel theorem proving this? If not, how are you so sure?
Are you telling me that you can’t imagine a lab that worries about this and tries to solve the alignment problem?
No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn’t exist, so it doesn’t factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.
We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn’t it?
The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.
No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn’t exist, so it doesn’t factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.
Don’t we have the resources and people to set up such a lab? If you think we don’t have the compute (and couldn’t get access to enough cloud compute or wouldn’t want to), that’s something we could invest in now, since there’s still time. Also, if there are still AI safety teams at any of the existing big labs, can’t they start their own projects there?
Don’t we have the resources and people to set up such a lab?
At present, not by a long shot. And doing so would probably make the problem worse; if we didn’t solve the underlying problem DeepMind would do whatever it was they were going to do anyways, except faster.
I find incredible that people can conceive AGI-designed nanofactories built in months but cannot imagine that a big lab can spend some spare time or money into looking at this, especially when there are probably people from those companies who are frequent LW readers.
I might have missed it, but it seems to be the first time you talk about “months” in your scenario. Wasn’t it “days” before? It matters because I don’t think it would take months for an AGI to built a nanotech factory.
Son, I wrote an entire longform explaining why we need to we attempt this. It’s just hard. The ML researchers and Google executives who are relevant to these decisions have a financial stake in speeding capabilities research along as fast as possible, and often have very dead-set views about AGI risk advocates being cultists or bikeshedders or alarmists. There is an entire community stigma against even talking about these issues in the neck of the woods you speak of. I agree that redirecting money from capabilities research to anything called alignment research would be good on net, but the problem is finding clear ways of doing that.
I don’t think it’s impossible! If you want to help, I can give you some tasks to start with. But we’re already tryin
The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.
Can we build a non-agential AGI to solve alignment? Or just a very smart task-specific AI?
If you come up with a way to build an AI that hasn’t crossed the rubicon of dangerous generality, but can solve alignment, that would very helpful. It doesn’t seem likely to be possible without already knowing how to solve alignment.
You could probably train a non-dangerous ML model that has superhuman theorem-proving abilities, but we don’t know how to formalize the alignment problem in a way that we can feed it into a theorem prover.
A model that can “solve alignment” for us would be a consequentialist agent explicitly modeling humans, and dangerous by default.
We might be able to formalize some pieces of the alignment problem, like MIRI tried with corrigibility. Also Vanessa Kosoy has some more formal work, too. Do you think there are no useful pieces to formalize? Or that all the pieces we try to formalize won’t together be enough even if we had solutions to them?
Also, even if it explicitly models humans, would it need to be consequentialist? Could we just have a powerful modeller trained to minimize prediction loss or whatever? The search space may be huge, but having a powerful modeller still seems plausibly useful. We could also filter options, possibly with a separate AI, not necessarily an AGI.
Do you think that in such a world, Demis Hassabis won’t get worried and change his mind about doing something about it?
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?
Do you think that in such a world, Demis Hassabis won’t get worried and change his mind about doing something about?
What about the above situation looks to someone like Yann Lecunn or Demis Hassabis in-the-moment like it should change their mind? The AGI isn’t saying “I’m going to kill you all”. It’s delivering persuasive and cogent arguments as to how misaligned intelligence is impossible, curing cancer or doing something equivalently PR-worthy, etc. etc. etc.
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?
Those other AGIs will also kill us, so it’s mostly irrelevant from our perspective whether or not the original AGI can compete with them to achieve its goals.
No, I am talking about a world where AGIs do exist and are powerful enough to design nanofactories. I trust that one of the big players will pay attention to safety, and I think it is perfectly reasonable
No solution is known does not mean that no solution exist
Re. The other AGIs will also kill us, please, be coherent: we have already stated that they can’t for the time being. I feel we start going in circles
I agree we’re talking in circles, but I think you are too far into the argument to really understand the situation you have painted. To be clear:
There is an active unaligned superintelligence and we’re closely communicating with it.
We are in the middle of building nanotechnology factories designed by said AGI which will, within say, 180 days, end all life on earth.
There is no known solution to the problem of building an aligned superintelligence. If we built one, as you mention, it would not be able to build its own nanomachines or do something equivalent within 180 days, and also we are being misled by an existing unaligned superintelligence on how to do so.
There are no EA community members or MIRI-equivalents in control of a major research lab to deploy such a solution if it existed.
What percentage of the time do you think we survive such a scenario? Don’t think of me, think of the gears.
First, the factories might take years to be built.
Second, I am not even convinced that the nanobotswill kill all humans etc, but I won’t go into this because discussing the gears here can be an infohazard.
Third, you build one AGI and you ask it the instructions to make other AGIs that are aligned. If the machine does not want, you disconnect it. It cannot say no because it has no way of fighting back (nanofactories still being built), and saying yes goes in its own interest. If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
This is called “solving the alignment problem”. A scheme which will, with formal guarantees, verify whether something is a solution to the alignment problem, can be translated into a solution to the alignment problem.
The problem is not that it’s not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don’t just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn’t do every single one of those things.
EDIT: also, uh, did we just circle back to “AGI won’t kill us because we’ll solve the alignment problem before it has enough time to kill us”? That sure is pretty far away from “AGI won’t figure out a way to kill us”, which is what your original claim was.
No. You have started from a premise that is far from proven, which is, the AGI will have the capacity to kill us all and there is nothing we can do about it, and any other argument that follow is based in that premise, of which I deny its validity saying that doing that trick is hard even being very very clever.
I don’t even know what you’re trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a “lose condition”?
It depends on so many specific details that I cannot really answer that. I am arguing against the possibility of a machine that kill us all, that’s all. The nanotech example was only to show that it is absurd to think that things will be so easy as: the machine creates nanotech and then game over
I feel that’s a bit unfair, especially after all the back and forth. You suggested an argument on how a machine can try to take over the world and I argued with specific reasons why that is not that easy. If you want, we can leave it here. Thank you for the discussion in any case, I really enjoyed it.
Replying to your edit, it does not really matter to me why specifically the AGI wont kill us. I think I am not contradicting myself: I think that you can have a machine that won’t kill us because it can’t and I also think that an AGI could potentially solve the alignment problem.
I feel like you’re still coming up with arguments when you should be running the simulation. Try to predict what would actually happen if we made “another adversarial machine that checks the results”.
Let me guess. The machines coordinate themselves to mislead the human team into believing that they are aligned? If that’s it, I am really not impressed
The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.
“Because the EA community does not control the major research labs”
Fine, replace that by any lab that cares about AGI. Are you telling me that you can’t imagine a lab that worries about this and tries to solve the alignment problem? Is that really harder to imagine than a superelaborate plan to kill all humans?
“Doesn’t know how to use a misaligned AGI safely to do that.”
We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn’t it?
“It’s not that we won’t try. It’s that we’re unable”
Is there the equivalent to a Godel theorem proving this? If not, how are you so sure?
No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn’t exist, so it doesn’t factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.
The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.
Don’t we have the resources and people to set up such a lab? If you think we don’t have the compute (and couldn’t get access to enough cloud compute or wouldn’t want to), that’s something we could invest in now, since there’s still time. Also, if there are still AI safety teams at any of the existing big labs, can’t they start their own projects there?
At present, not by a long shot. And doing so would probably make the problem worse; if we didn’t solve the underlying problem DeepMind would do whatever it was they were going to do anyways, except faster.
I find incredible that people can conceive AGI-designed nanofactories built in months but cannot imagine that a big lab can spend some spare time or money into looking at this, especially when there are probably people from those companies who are frequent LW readers.
I might have missed it, but it seems to be the first time you talk about “months” in your scenario. Wasn’t it “days” before? It matters because I don’t think it would take months for an AGI to built a nanotech factory.
Son, I wrote an entire longform explaining why we need to we attempt this. It’s just hard. The ML researchers and Google executives who are relevant to these decisions have a financial stake in speeding capabilities research along as fast as possible, and often have very dead-set views about AGI risk advocates being cultists or bikeshedders or alarmists. There is an entire community stigma against even talking about these issues in the neck of the woods you speak of. I agree that redirecting money from capabilities research to anything called alignment research would be good on net, but the problem is finding clear ways of doing that.
I don’t think it’s impossible! If you want to help, I can give you some tasks to start with. But we’re already tryin
Too busy at the moment but if you remind me this in a few months time, I may. Thanks
Can we build a non-agential AGI to solve alignment? Or just a very smart task-specific AI?
If you come up with a way to build an AI that hasn’t crossed the rubicon of dangerous generality, but can solve alignment, that would very helpful. It doesn’t seem likely to be possible without already knowing how to solve alignment.
Why is this?
You could probably train a non-dangerous ML model that has superhuman theorem-proving abilities, but we don’t know how to formalize the alignment problem in a way that we can feed it into a theorem prover.
A model that can “solve alignment” for us would be a consequentialist agent explicitly modeling humans, and dangerous by default.
We might be able to formalize some pieces of the alignment problem, like MIRI tried with corrigibility. Also Vanessa Kosoy has some more formal work, too. Do you think there are no useful pieces to formalize? Or that all the pieces we try to formalize won’t together be enough even if we had solutions to them?
Also, even if it explicitly models humans, would it need to be consequentialist? Could we just have a powerful modeller trained to minimize prediction loss or whatever? The search space may be huge, but having a powerful modeller still seems plausibly useful. We could also filter options, possibly with a separate AI, not necessarily an AGI.
I don’t see why not but there is a probably an infalsifiable reason of why this is impossible, and I am looking forward to reading it
Do you think that in such a world, Demis Hassabis won’t get worried and change his mind about doing something about it?
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?
What about the above situation looks to someone like Yann Lecunn or Demis Hassabis in-the-moment like it should change their mind? The AGI isn’t saying “I’m going to kill you all”. It’s delivering persuasive and cogent arguments as to how misaligned intelligence is impossible, curing cancer or doing something equivalently PR-worthy, etc. etc. etc.
If he does change his mind, there’s still nothing he can do. No solution is known.
Those other AGIs will also kill us, so it’s mostly irrelevant from our perspective whether or not the original AGI can compete with them to achieve its goals.
No, I am talking about a world where AGIs do exist and are powerful enough to design nanofactories. I trust that one of the big players will pay attention to safety, and I think it is perfectly reasonable
No solution is known does not mean that no solution exist
Re. The other AGIs will also kill us, please, be coherent: we have already stated that they can’t for the time being. I feel we start going in circles
I agree we’re talking in circles, but I think you are too far into the argument to really understand the situation you have painted. To be clear:
There is an active unaligned superintelligence and we’re closely communicating with it.
We are in the middle of building nanotechnology factories designed by said AGI which will, within say, 180 days, end all life on earth.
There is no known solution to the problem of building an aligned superintelligence. If we built one, as you mention, it would not be able to build its own nanomachines or do something equivalent within 180 days, and also we are being misled by an existing unaligned superintelligence on how to do so.
There are no EA community members or MIRI-equivalents in control of a major research lab to deploy such a solution if it existed.
What percentage of the time do you think we survive such a scenario? Don’t think of me, think of the gears.
First, the factories might take years to be built.
Second, I am not even convinced that the nanobotswill kill all humans etc, but I won’t go into this because discussing the gears here can be an infohazard.
Third, you build one AGI and you ask it the instructions to make other AGIs that are aligned. If the machine does not want, you disconnect it. It cannot say no because it has no way of fighting back (nanofactories still being built), and saying yes goes in its own interest. If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
This is called “solving the alignment problem”. A scheme which will, with formal guarantees, verify whether something is a solution to the alignment problem, can be translated into a solution to the alignment problem.
Yes, and I haven’t seen a good reason of why this is not possible.
The problem is not that it’s not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don’t just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn’t do every single one of those things.
EDIT: also, uh, did we just circle back to “AGI won’t kill us because we’ll solve the alignment problem before it has enough time to kill us”? That sure is pretty far away from “AGI won’t figure out a way to kill us”, which is what your original claim was.
No. You have started from a premise that is far from proven, which is, the AGI will have the capacity to kill us all and there is nothing we can do about it, and any other argument that follow is based in that premise, of which I deny its validity saying that doing that trick is hard even being very very clever.
I don’t even know what you’re trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a “lose condition”?
It depends on so many specific details that I cannot really answer that. I am arguing against the possibility of a machine that kill us all, that’s all. The nanotech example was only to show that it is absurd to think that things will be so easy as: the machine creates nanotech and then game over
I don’t actually see that you’ve presented an argument anywhere.
I feel that’s a bit unfair, especially after all the back and forth. You suggested an argument on how a machine can try to take over the world and I argued with specific reasons why that is not that easy. If you want, we can leave it here. Thank you for the discussion in any case, I really enjoyed it.
Replying to your edit, it does not really matter to me why specifically the AGI wont kill us. I think I am not contradicting myself: I think that you can have a machine that won’t kill us because it can’t and I also think that an AGI could potentially solve the alignment problem.
I feel like you’re still coming up with arguments when you should be running the simulation. Try to predict what would actually happen if we made “another adversarial machine that checks the results”.
Let me guess. The machines coordinate themselves to mislead the human team into believing that they are aligned? If that’s it, I am really not impressed
The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.