Let’s put aside the question of whether an AGI would be able to not just solve the technical (theoretical and engineering) problems of nanotech, but also the practical ones under constraints of secrecy. How do you get to a world where AGI solves nanotech and then we don’t build nanotech fabs after it gives us the schematics for them?
We can verify nanotech designs (possibly with AI assistance) and reject any that look dangerous or are too difficult to understand. Also commit to destroying the AGI if it gives us something bad enough.
Also, maybe nanotech has important limitations or weaknesses that allow for monitoring and effective defences against it.
Why is not possible to check whether those nanobots are dangerous beforehand? In bjotech we already do that. For instance, if someone would try to synthesise some DNA sequences from certain bacteria, all alarms would go off.
Sorry, I might have not been clear enough. I understand that a machine would give us the instructions to create those fabricators but maybe not the designs. But what makes you think that those factories won’t have controls of what’s being produced in them?
Controls that who wrote? How good is our current industrial infrastructure at protecting against human-level exploitation, either via code or otherwise?
Can you verify code to be sure there’s no virus in it? It took years of trial and error to patch up some semblance of internet security. A single flaw in your nanotech factory is all a hostile AI would need.
We’ll have advanced AI by then we could use to help verify inputs or the design, or, as I said, we could use stricter standards, if nanotechnology is recognized as potentially dangerous.
A single flaw and them all humans die at once? I don’t see how. Or better put, I can conceive many reasons why this plan fails. Also, I don’t see how see build those factories in the first place and we can’t use that time window to make the AGI to produce explicit results on AGI safety
Or better put, I can conceive many reasons why this plan fails.
Then could you produce a few of the main ones, to allow for examination?
Also, I don’t see how see build those factories in the first place and we can’t use that time window to make the AGI to produce explicit results on AGI safety
What’s the time window in your scenario? As I noted in a different comment, I can agree with “days” as you initially stated. That’s barely enough time for the EA community to notice there’s a problem.
Anything (edit: except solutions of mathematical problems) that’s not difficult to understand isn’t powerful enough to be valuable.
Not to mention the AGI has the ability to fool both us and our AI into thinking it’s easy to understand and harmless, and then it will kill us all anyway.
Anything that’s not difficult to understand isn’t powerful enough to be valuable.
This is not necessarily true. Consider NP problems: those where the solution is relatively small and easy to verify, but where there’s a huge search space for potential solutions and no one knows any search algorithms much better than brute force. And then, outside the realm of pure math/CS, I’d say science and engineering are full of “metaphorically” NP problems that fit that description: you’re searching for a formula relating some physical quantities, or the right mixture of chemicals for a strong alloy, or some drug-molecule that affects the human body the desired way; and the answer probably fits into 100 characters, but obviously brute-force searching all 100-character strings is impractical.
If we were serious about getting useful nanotech from an AGI, I think we’d ask it to produce its designs alongside formal proofs of safety properties that can be verified by a conventional program.
Consider NP problems: those where the solution is relatively small and easy to verify, but where there’s a huge search space for potential solutions and no one knows any search algorithms much better than brute force.
That’s a good point. We can use the AGI to solve open math problems for us whose solutions we can easily check. Such an AGI would still be unsafe for other reasons though. But yeah, I didn’t remember this, and I was thinking about physical problems (like nanosystems).
For difficult problems in physical universe though, we can’t easily non-empirically check the solution. (For example, it’s not possible to non-empirically check if a molecule affects the human body in a desired way, and I’d expect that non-empirically checking if a nanosystem is safe would be at least as hard.)
For the physical world, I think there is a decent-sized space of “problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources”. In particular, I think this class of questions is pretty safe: “Here are 1000 possible vaccine formulations / new steel-manufacturing processes / drug candidates / etc. that human researchers came up with and would try out if they had the resources. Can you tell us which will work the best?”
So, if it tells us the best answer, then we verify it works well, and save on the costs of hundreds of experiments; if it tells us a bad answer, then we discover that in our testing and we’ve learned something valuable about the AGI. If its answers are highly constrained, like “reply with a number from 1 to 1000 indicating which is the best possibility, and [question-specific, but, using an example] two additional numbers describing the tensile strength and density of the resulting steel”, then that should rule out it being able to hack the human readers; and since these are chosen from proposals humans would have plausibly tried in the first place, that should limit its ability to trick us into creating subtle poisons or ice-nine or something.
For the physical world, I think there is a decent-sized space of “problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources”.
I agree that would be highly valuable from our current perspective (even though extremely low-value compared to what a Friendly AI could do, since it could only select a course of action that humans already thought of and humans are the ones who would need to carry it out).
So such an AI won’t kill us by giving us that advice, but it will kill us in other ways.
(Also, the screen itself will have to be restricted to only display the number, otherwise the AI can say something else and talk itself out of the box.)
Please notice that I never said that an AGI won’t be unsafe.
If you admit that it is possible that at some point we can be using AGIs to verify certain theorems, then we pretty much agree. Other people wouldn’t agree with that because they will tell you that humanity ends as soon as we have an AGI, and this is the idea I am trying to fight against
I think no one, incuding EY, doesn’t think “humanity ends as soon as we have an AGI”. Actual opinion is “Agentic AGI that optimize something and ends humanity in the process will probably by default be created before we will solve alignment or will be able to commit pivotal act that prevents the creation of such AGI”. As I understand, EY thinks that we probably can create non-agentic or weak AGI that will not kill us all, but it will not prevent strong agentic AGI that will.
Jokes aside, yes, maybe we do build those factories. How long does it take? What does the AGI do in the meantime? Why can’t we threaten it with disconnection if it doesn’t solve the alignment problem?
How long does it take? What does the AGI do in the meantime?
Doesn’t really matter if we’re building the factories. Perhaps it’s making copies of itself, doing whatever least likely to get it disconnected; we’re dead in N days so we are pretty much entirely off the chessboard.
Why can’t we threaten it with disconnection if it doesn’t solve the alignment problem?
You’re making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally.
Because we won’t be able to verify the solution, which is the whole problem. The AGI will say “here, run this code, it’s an aligned AGI” and it won’t in fact be aligned AGI.
Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem?
“You’re making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally.”
No, I am drawing the logical conclusion that if an AGi is built and does not automatically kills all humans (and it has been previously stated that we have at least N days), an organisation wanting to solve the alignment problem can create another AGI
“Because we won’t be able to verify the solution, which is the whole problem. The AGI will say “here, run this code, it’s an aligned AGI” and it won’t in fact be aligned AGI”
Well, you make a precondition that we are able to verify the solution to be a valid solution. Do you find that inconceivable?
Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem?
Because the EA community does not control the major research labs, and also doesn’t know how to use a misaligned AGI safely to do that. “Use AGI to get a solution to the alignment problem” is a very common suggestion, but if we knew how to do that, and we controlled the major research labs, we do that the first time instead of just making the unaligned AGI.
Well, you make a precondition that we are able to verify the solution to be a valid solution. Do you find that inconceivable?
It’s not that we won’t try. It’s that we’re unable. We will take the argument this superintelligent machine gives us and go “oh, that looks right” and kill ourselves in the way it suggests. If there were a predefined method of verifying an agents’ adherance to CEV, that would go a long way of getting us to the alignment problem, but we have no such verification method.
“Because the EA community does not control the major research labs”
Fine, replace that by any lab that cares about AGI. Are you telling me that you can’t imagine a lab that worries about this and tries to solve the alignment problem? Is that really harder to imagine than a superelaborate plan to kill all humans?
“Doesn’t know how to use a misaligned AGI safely to do that.”
We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn’t it?
“It’s not that we won’t try. It’s that we’re unable”
Is there the equivalent to a Godel theorem proving this? If not, how are you so sure?
Are you telling me that you can’t imagine a lab that worries about this and tries to solve the alignment problem?
No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn’t exist, so it doesn’t factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.
We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn’t it?
The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.
No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn’t exist, so it doesn’t factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.
Don’t we have the resources and people to set up such a lab? If you think we don’t have the compute (and couldn’t get access to enough cloud compute or wouldn’t want to), that’s something we could invest in now, since there’s still time. Also, if there are still AI safety teams at any of the existing big labs, can’t they start their own projects there?
Don’t we have the resources and people to set up such a lab?
At present, not by a long shot. And doing so would probably make the problem worse; if we didn’t solve the underlying problem DeepMind would do whatever it was they were going to do anyways, except faster.
I find incredible that people can conceive AGI-designed nanofactories built in months but cannot imagine that a big lab can spend some spare time or money into looking at this, especially when there are probably people from those companies who are frequent LW readers.
I might have missed it, but it seems to be the first time you talk about “months” in your scenario. Wasn’t it “days” before? It matters because I don’t think it would take months for an AGI to built a nanotech factory.
Son, I wrote an entire longform explaining why we need to we attempt this. It’s just hard. The ML researchers and Google executives who are relevant to these decisions have a financial stake in speeding capabilities research along as fast as possible, and often have very dead-set views about AGI risk advocates being cultists or bikeshedders or alarmists. There is an entire community stigma against even talking about these issues in the neck of the woods you speak of. I agree that redirecting money from capabilities research to anything called alignment research would be good on net, but the problem is finding clear ways of doing that.
I don’t think it’s impossible! If you want to help, I can give you some tasks to start with. But we’re already tryin
The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.
Can we build a non-agential AGI to solve alignment? Or just a very smart task-specific AI?
If you come up with a way to build an AI that hasn’t crossed the rubicon of dangerous generality, but can solve alignment, that would very helpful. It doesn’t seem likely to be possible without already knowing how to solve alignment.
You could probably train a non-dangerous ML model that has superhuman theorem-proving abilities, but we don’t know how to formalize the alignment problem in a way that we can feed it into a theorem prover.
A model that can “solve alignment” for us would be a consequentialist agent explicitly modeling humans, and dangerous by default.
We might be able to formalize some pieces of the alignment problem, like MIRI tried with corrigibility. Also Vanessa Kosoy has some more formal work, too. Do you think there are no useful pieces to formalize? Or that all the pieces we try to formalize won’t together be enough even if we had solutions to them?
Also, even if it explicitly models humans, would it need to be consequentialist? Could we just have a powerful modeller trained to minimize prediction loss or whatever? The search space may be huge, but having a powerful modeller still seems plausibly useful. We could also filter options, possibly with a separate AI, not necessarily an AGI.
Do you think that in such a world, Demis Hassabis won’t get worried and change his mind about doing something about it?
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?
Do you think that in such a world, Demis Hassabis won’t get worried and change his mind about doing something about?
What about the above situation looks to someone like Yann Lecunn or Demis Hassabis in-the-moment like it should change their mind? The AGI isn’t saying “I’m going to kill you all”. It’s delivering persuasive and cogent arguments as to how misaligned intelligence is impossible, curing cancer or doing something equivalently PR-worthy, etc. etc. etc.
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?
Those other AGIs will also kill us, so it’s mostly irrelevant from our perspective whether or not the original AGI can compete with them to achieve its goals.
No, I am talking about a world where AGIs do exist and are powerful enough to design nanofactories. I trust that one of the big players will pay attention to safety, and I think it is perfectly reasonable
No solution is known does not mean that no solution exist
Re. The other AGIs will also kill us, please, be coherent: we have already stated that they can’t for the time being. I feel we start going in circles
I agree we’re talking in circles, but I think you are too far into the argument to really understand the situation you have painted. To be clear:
There is an active unaligned superintelligence and we’re closely communicating with it.
We are in the middle of building nanotechnology factories designed by said AGI which will, within say, 180 days, end all life on earth.
There is no known solution to the problem of building an aligned superintelligence. If we built one, as you mention, it would not be able to build its own nanomachines or do something equivalent within 180 days, and also we are being misled by an existing unaligned superintelligence on how to do so.
There are no EA community members or MIRI-equivalents in control of a major research lab to deploy such a solution if it existed.
What percentage of the time do you think we survive such a scenario? Don’t think of me, think of the gears.
First, the factories might take years to be built.
Second, I am not even convinced that the nanobotswill kill all humans etc, but I won’t go into this because discussing the gears here can be an infohazard.
Third, you build one AGI and you ask it the instructions to make other AGIs that are aligned. If the machine does not want, you disconnect it. It cannot say no because it has no way of fighting back (nanofactories still being built), and saying yes goes in its own interest. If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
This is called “solving the alignment problem”. A scheme which will, with formal guarantees, verify whether something is a solution to the alignment problem, can be translated into a solution to the alignment problem.
The problem is not that it’s not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don’t just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn’t do every single one of those things.
EDIT: also, uh, did we just circle back to “AGI won’t kill us because we’ll solve the alignment problem before it has enough time to kill us”? That sure is pretty far away from “AGI won’t figure out a way to kill us”, which is what your original claim was.
No. You have started from a premise that is far from proven, which is, the AGI will have the capacity to kill us all and there is nothing we can do about it, and any other argument that follow is based in that premise, of which I deny its validity saying that doing that trick is hard even being very very clever.
I don’t even know what you’re trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a “lose condition”?
It depends on so many specific details that I cannot really answer that. I am arguing against the possibility of a machine that kill us all, that’s all. The nanotech example was only to show that it is absurd to think that things will be so easy as: the machine creates nanotech and then game over
I feel that’s a bit unfair, especially after all the back and forth. You suggested an argument on how a machine can try to take over the world and I argued with specific reasons why that is not that easy. If you want, we can leave it here. Thank you for the discussion in any case, I really enjoyed it.
Replying to your edit, it does not really matter to me why specifically the AGI wont kill us. I think I am not contradicting myself: I think that you can have a machine that won’t kill us because it can’t and I also think that an AGI could potentially solve the alignment problem.
I feel like you’re still coming up with arguments when you should be running the simulation. Try to predict what would actually happen if we made “another adversarial machine that checks the results”.
Let me guess. The machines coordinate themselves to mislead the human team into believing that they are aligned? If that’s it, I am really not impressed
The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.
Who is this “we” that you’re imagining refuses to interact with the outputs of the AGI except to demand a solution to the alignment problem? (And why would they be asking that, given that it seems to already be aligned?)
EDIT: remember, we’re operating in a regime where the organizations at the forefront of AI capabilities are not the ones who seem to be terribly concerned with alignment risk! Deepmind and OpenAI have safety teams, but I’d be very surprised if those safety teams actually had the power to unilaterally control all usage of and interactions with trained models.
Fine, replace we by any group that has access to the AGI. In the world you are describing there is a time window between AGI is developed and nanofactories are built, so I expect that more than one AGI can be made during that time by different organisations. Why can’t MIRi in that world develop their own AGI and then use it?
Two cases are possible: Either a singleton is established and it is able to remain a singleton due to strategic interests (of either AGI or the group), or a singleton loses its lead and we have a multipolar situation with more than 1 groups having AGI.
In case 1, if the lead established is say, 6 months or more, it might not be possible for the 2nd place group to get there as the work done during this period by the lead would be driven by intelligence explosion, and far faster than the 2nd. This only incentivizes going forward as fast as possible and is not a good safety mindset.
In case 2, we have the risk of multiple projects developing AGI and thus the risk of something going wrong also increases. Even if group 1 is able to implement safety measures, some other group might fail, and the outcome would be disastrous, unless AGI by the Group 1 is specifically going to solve the control problem for us.
That’s ok because it won’t have human killing capabilities (just following your example!). Why can’t the AGI find the solution to the alignment problem?
Please read carefully my post, because I think I have been very clear of what it is that I am arguing against. If you think that EY is just saying that our civilization can be disruptive, you are not paying attention
I am just following the example that they gave me to show that things are in fact more complicated to what they are suggesting. To be clear, in the example, the AGI looks for a way to kill humans using nanotech but it first needs to build those nanotech factories
I’m confused—didn’t OP just say they don’t expect nanotechnology to be solvable , even with AGI? If so, than you seem to be assuming the crux in your question…
To clarify, I do think that creating nanobots is solvable. That is one thing: making factories, making designs that kill humans, deploying those nanobots, doing everything without raising any alarms and at risk close to zero, is, in my opinion, impossible.
I want to remark that people keep using the argument of nanotechnology totally uncritically, as it were the magical solution that makes an AGI take over the world in two weeks. They are not really considering the gears inside that part of the model
If OP doesn’t think nanotech is solvable in principle, I’m not sure where to take the conversation, since we already have an existence proof (i.e. biology). If they object to specific nanotech capabilities that aren’t extant in existing nanotech but aren’t ruled out by the laws of physics, that requires a justification.
Let’s put aside the question of whether an AGI would be able to not just solve the technical (theoretical and engineering) problems of nanotech, but also the practical ones under constraints of secrecy. How do you get to a world where AGI solves nanotech and then we don’t build nanotech fabs after it gives us the schematics for them?
We can verify nanotech designs (possibly with AI assistance) and reject any that look dangerous or are too difficult to understand. Also commit to destroying the AGI if it gives us something bad enough.
Also, maybe nanotech has important limitations or weaknesses that allow for monitoring and effective defences against it.
You aren’t going to get designs for specific nanotech, you’re going to get designs for generic nanotech fabricators.
Why is not possible to check whether those nanobots are dangerous beforehand? In bjotech we already do that. For instance, if someone would try to synthesise some DNA sequences from certain bacteria, all alarms would go off.
Can you reread what I wrote?
Sorry, I might have not been clear enough. I understand that a machine would give us the instructions to create those fabricators but maybe not the designs. But what makes you think that those factories won’t have controls of what’s being produced in them?
Controls that who wrote? How good is our current industrial infrastructure at protecting against human-level exploitation, either via code or otherwise?
How do the fabricators work? We can verify their inputs, too, right?
Can you verify code to be sure there’s no virus in it? It took years of trial and error to patch up some semblance of internet security. A single flaw in your nanotech factory is all a hostile AI would need.
We’ll have advanced AI by then we could use to help verify inputs or the design, or, as I said, we could use stricter standards, if nanotechnology is recognized as potentially dangerous.
A single flaw and them all humans die at once? I don’t see how. Or better put, I can conceive many reasons why this plan fails. Also, I don’t see how see build those factories in the first place and we can’t use that time window to make the AGI to produce explicit results on AGI safety
Then could you produce a few of the main ones, to allow for examination?
What’s the time window in your scenario? As I noted in a different comment, I can agree with “days” as you initially stated. That’s barely enough time for the EA community to notice there’s a problem.
Anything (edit: except solutions of mathematical problems) that’s not difficult to understand isn’t powerful enough to be valuable.
Not to mention the AGI has the ability to fool both us and our AI into thinking it’s easy to understand and harmless, and then it will kill us all anyway.
This is not necessarily true. Consider NP problems: those where the solution is relatively small and easy to verify, but where there’s a huge search space for potential solutions and no one knows any search algorithms much better than brute force. And then, outside the realm of pure math/CS, I’d say science and engineering are full of “metaphorically” NP problems that fit that description: you’re searching for a formula relating some physical quantities, or the right mixture of chemicals for a strong alloy, or some drug-molecule that affects the human body the desired way; and the answer probably fits into 100 characters, but obviously brute-force searching all 100-character strings is impractical.
If we were serious about getting useful nanotech from an AGI, I think we’d ask it to produce its designs alongside formal proofs of safety properties that can be verified by a conventional program.
That’s a good point. We can use the AGI to solve open math problems for us whose solutions we can easily check. Such an AGI would still be unsafe for other reasons though. But yeah, I didn’t remember this, and I was thinking about physical problems (like nanosystems).
For difficult problems in physical universe though, we can’t easily non-empirically check the solution. (For example, it’s not possible to non-empirically check if a molecule affects the human body in a desired way, and I’d expect that non-empirically checking if a nanosystem is safe would be at least as hard.)
For the physical world, I think there is a decent-sized space of “problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources”. In particular, I think this class of questions is pretty safe: “Here are 1000 possible vaccine formulations / new steel-manufacturing processes / drug candidates / etc. that human researchers came up with and would try out if they had the resources. Can you tell us which will work the best?”
So, if it tells us the best answer, then we verify it works well, and save on the costs of hundreds of experiments; if it tells us a bad answer, then we discover that in our testing and we’ve learned something valuable about the AGI. If its answers are highly constrained, like “reply with a number from 1 to 1000 indicating which is the best possibility, and [question-specific, but, using an example] two additional numbers describing the tensile strength and density of the resulting steel”, then that should rule out it being able to hack the human readers; and since these are chosen from proposals humans would have plausibly tried in the first place, that should limit its ability to trick us into creating subtle poisons or ice-nine or something.
There was a thread two months ago where I said similar stuff, here: https://www.lesswrong.com/posts/4T59sx6uQanf5T79h/interacting-with-a-boxed-ai?commentId=XMP4fzPGENSWxrKaA
I agree that would be highly valuable from our current perspective (even though extremely low-value compared to what a Friendly AI could do, since it could only select a course of action that humans already thought of and humans are the ones who would need to carry it out).
So such an AI won’t kill us by giving us that advice, but it will kill us in other ways.
(Also, the screen itself will have to be restricted to only display the number, otherwise the AI can say something else and talk itself out of the box.)
Please notice that I never said that an AGI won’t be unsafe.
If you admit that it is possible that at some point we can be using AGIs to verify certain theorems, then we pretty much agree. Other people wouldn’t agree with that because they will tell you that humanity ends as soon as we have an AGI, and this is the idea I am trying to fight against
The AGI will kill us in other ways than its theorem proofs being either-hard-to-check-or-useless, but it will kill us nevertheless.
I think no one, incuding EY, doesn’t think “humanity ends as soon as we have an AGI”. Actual opinion is “Agentic AGI that optimize something and ends humanity in the process will probably by default be created before we will solve alignment or will be able to commit pivotal act that prevents the creation of such AGI”. As I understand, EY thinks that we probably can create non-agentic or weak AGI that will not kill us all, but it will not prevent strong agentic AGI that will.
Maybe a world deeply inadequate? Oh wait...
Jokes aside, yes, maybe we do build those factories. How long does it take? What does the AGI do in the meantime? Why can’t we threaten it with disconnection if it doesn’t solve the alignment problem?
Doesn’t really matter if we’re building the factories. Perhaps it’s making copies of itself, doing whatever least likely to get it disconnected; we’re dead in N days so we are pretty much entirely off the chessboard.
You’re making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally.
Because we won’t be able to verify the solution, which is the whole problem. The AGI will say “here, run this code, it’s an aligned AGI” and it won’t in fact be aligned AGI.
Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem?
“You’re making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally.”
No, I am drawing the logical conclusion that if an AGi is built and does not automatically kills all humans (and it has been previously stated that we have at least N days), an organisation wanting to solve the alignment problem can create another AGI
“Because we won’t be able to verify the solution, which is the whole problem. The AGI will say “here, run this code, it’s an aligned AGI” and it won’t in fact be aligned AGI”
Well, you make a precondition that we are able to verify the solution to be a valid solution. Do you find that inconceivable?
Because the EA community does not control the major research labs, and also doesn’t know how to use a misaligned AGI safely to do that. “Use AGI to get a solution to the alignment problem” is a very common suggestion, but if we knew how to do that, and we controlled the major research labs, we do that the first time instead of just making the unaligned AGI.
“You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason.
This picture is unfortunately accurate, due to how little dignity we’re dying with.
But if we were on course to die with more dignity than this, we’d still die. The recklessness is not the source of the problem. The problem is that cautious people do not know what to do to get an AI that doesn’t destroy the world, even if they want that; not because they’re “insufficiently educated” in some solution that is known elsewhere, but because there is no known plan in which to educate them.”
It’s not that we won’t try. It’s that we’re unable. We will take the argument this superintelligent machine gives us and go “oh, that looks right” and kill ourselves in the way it suggests. If there were a predefined method of verifying an agents’ adherance to CEV, that would go a long way of getting us to the alignment problem, but we have no such verification method.
“Because the EA community does not control the major research labs”
Fine, replace that by any lab that cares about AGI. Are you telling me that you can’t imagine a lab that worries about this and tries to solve the alignment problem? Is that really harder to imagine than a superelaborate plan to kill all humans?
“Doesn’t know how to use a misaligned AGI safely to do that.”
We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn’t it?
“It’s not that we won’t try. It’s that we’re unable”
Is there the equivalent to a Godel theorem proving this? If not, how are you so sure?
No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn’t exist, so it doesn’t factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.
The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.
Don’t we have the resources and people to set up such a lab? If you think we don’t have the compute (and couldn’t get access to enough cloud compute or wouldn’t want to), that’s something we could invest in now, since there’s still time. Also, if there are still AI safety teams at any of the existing big labs, can’t they start their own projects there?
At present, not by a long shot. And doing so would probably make the problem worse; if we didn’t solve the underlying problem DeepMind would do whatever it was they were going to do anyways, except faster.
I find incredible that people can conceive AGI-designed nanofactories built in months but cannot imagine that a big lab can spend some spare time or money into looking at this, especially when there are probably people from those companies who are frequent LW readers.
I might have missed it, but it seems to be the first time you talk about “months” in your scenario. Wasn’t it “days” before? It matters because I don’t think it would take months for an AGI to built a nanotech factory.
Son, I wrote an entire longform explaining why we need to we attempt this. It’s just hard. The ML researchers and Google executives who are relevant to these decisions have a financial stake in speeding capabilities research along as fast as possible, and often have very dead-set views about AGI risk advocates being cultists or bikeshedders or alarmists. There is an entire community stigma against even talking about these issues in the neck of the woods you speak of. I agree that redirecting money from capabilities research to anything called alignment research would be good on net, but the problem is finding clear ways of doing that.
I don’t think it’s impossible! If you want to help, I can give you some tasks to start with. But we’re already tryin
Too busy at the moment but if you remind me this in a few months time, I may. Thanks
Can we build a non-agential AGI to solve alignment? Or just a very smart task-specific AI?
If you come up with a way to build an AI that hasn’t crossed the rubicon of dangerous generality, but can solve alignment, that would very helpful. It doesn’t seem likely to be possible without already knowing how to solve alignment.
Why is this?
You could probably train a non-dangerous ML model that has superhuman theorem-proving abilities, but we don’t know how to formalize the alignment problem in a way that we can feed it into a theorem prover.
A model that can “solve alignment” for us would be a consequentialist agent explicitly modeling humans, and dangerous by default.
We might be able to formalize some pieces of the alignment problem, like MIRI tried with corrigibility. Also Vanessa Kosoy has some more formal work, too. Do you think there are no useful pieces to formalize? Or that all the pieces we try to formalize won’t together be enough even if we had solutions to them?
Also, even if it explicitly models humans, would it need to be consequentialist? Could we just have a powerful modeller trained to minimize prediction loss or whatever? The search space may be huge, but having a powerful modeller still seems plausibly useful. We could also filter options, possibly with a separate AI, not necessarily an AGI.
I don’t see why not but there is a probably an infalsifiable reason of why this is impossible, and I am looking forward to reading it
Do you think that in such a world, Demis Hassabis won’t get worried and change his mind about doing something about it?
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?
What about the above situation looks to someone like Yann Lecunn or Demis Hassabis in-the-moment like it should change their mind? The AGI isn’t saying “I’m going to kill you all”. It’s delivering persuasive and cogent arguments as to how misaligned intelligence is impossible, curing cancer or doing something equivalently PR-worthy, etc. etc. etc.
If he does change his mind, there’s still nothing he can do. No solution is known.
Those other AGIs will also kill us, so it’s mostly irrelevant from our perspective whether or not the original AGI can compete with them to achieve its goals.
No, I am talking about a world where AGIs do exist and are powerful enough to design nanofactories. I trust that one of the big players will pay attention to safety, and I think it is perfectly reasonable
No solution is known does not mean that no solution exist
Re. The other AGIs will also kill us, please, be coherent: we have already stated that they can’t for the time being. I feel we start going in circles
I agree we’re talking in circles, but I think you are too far into the argument to really understand the situation you have painted. To be clear:
There is an active unaligned superintelligence and we’re closely communicating with it.
We are in the middle of building nanotechnology factories designed by said AGI which will, within say, 180 days, end all life on earth.
There is no known solution to the problem of building an aligned superintelligence. If we built one, as you mention, it would not be able to build its own nanomachines or do something equivalent within 180 days, and also we are being misled by an existing unaligned superintelligence on how to do so.
There are no EA community members or MIRI-equivalents in control of a major research lab to deploy such a solution if it existed.
What percentage of the time do you think we survive such a scenario? Don’t think of me, think of the gears.
First, the factories might take years to be built.
Second, I am not even convinced that the nanobotswill kill all humans etc, but I won’t go into this because discussing the gears here can be an infohazard.
Third, you build one AGI and you ask it the instructions to make other AGIs that are aligned. If the machine does not want, you disconnect it. It cannot say no because it has no way of fighting back (nanofactories still being built), and saying yes goes in its own interest. If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
This is called “solving the alignment problem”. A scheme which will, with formal guarantees, verify whether something is a solution to the alignment problem, can be translated into a solution to the alignment problem.
Yes, and I haven’t seen a good reason of why this is not possible.
The problem is not that it’s not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don’t just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn’t do every single one of those things.
EDIT: also, uh, did we just circle back to “AGI won’t kill us because we’ll solve the alignment problem before it has enough time to kill us”? That sure is pretty far away from “AGI won’t figure out a way to kill us”, which is what your original claim was.
No. You have started from a premise that is far from proven, which is, the AGI will have the capacity to kill us all and there is nothing we can do about it, and any other argument that follow is based in that premise, of which I deny its validity saying that doing that trick is hard even being very very clever.
I don’t even know what you’re trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a “lose condition”?
It depends on so many specific details that I cannot really answer that. I am arguing against the possibility of a machine that kill us all, that’s all. The nanotech example was only to show that it is absurd to think that things will be so easy as: the machine creates nanotech and then game over
I don’t actually see that you’ve presented an argument anywhere.
I feel that’s a bit unfair, especially after all the back and forth. You suggested an argument on how a machine can try to take over the world and I argued with specific reasons why that is not that easy. If you want, we can leave it here. Thank you for the discussion in any case, I really enjoyed it.
Replying to your edit, it does not really matter to me why specifically the AGI wont kill us. I think I am not contradicting myself: I think that you can have a machine that won’t kill us because it can’t and I also think that an AGI could potentially solve the alignment problem.
I feel like you’re still coming up with arguments when you should be running the simulation. Try to predict what would actually happen if we made “another adversarial machine that checks the results”.
Let me guess. The machines coordinate themselves to mislead the human team into believing that they are aligned? If that’s it, I am really not impressed
The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.
Who is this “we” that you’re imagining refuses to interact with the outputs of the AGI except to demand a solution to the alignment problem? (And why would they be asking that, given that it seems to already be aligned?)
EDIT: remember, we’re operating in a regime where the organizations at the forefront of AI capabilities are not the ones who seem to be terribly concerned with alignment risk! Deepmind and OpenAI have safety teams, but I’d be very surprised if those safety teams actually had the power to unilaterally control all usage of and interactions with trained models.
Fine, replace we by any group that has access to the AGI. In the world you are describing there is a time window between AGI is developed and nanofactories are built, so I expect that more than one AGI can be made during that time by different organisations. Why can’t MIRi in that world develop their own AGI and then use it?
Two cases are possible: Either a singleton is established and it is able to remain a singleton due to strategic interests (of either AGI or the group), or a singleton loses its lead and we have a multipolar situation with more than 1 groups having AGI.
In case 1, if the lead established is say, 6 months or more, it might not be possible for the 2nd place group to get there as the work done during this period by the lead would be driven by intelligence explosion, and far faster than the 2nd. This only incentivizes going forward as fast as possible and is not a good safety mindset.
In case 2, we have the risk of multiple projects developing AGI and thus the risk of something going wrong also increases. Even if group 1 is able to implement safety measures, some other group might fail, and the outcome would be disastrous, unless AGI by the Group 1 is specifically going to solve the control problem for us.
...because it still won’t be aligned?
That’s ok because it won’t have human killing capabilities (just following your example!). Why can’t the AGI find the solution to the alignment problem?
An AGI doesn’t have to kill humans directly for our civilization to be disrupted.
Why would the AGI not have capabilities to pursue this if needed?
Please read carefully my post, because I think I have been very clear of what it is that I am arguing against. If you think that EY is just saying that our civilization can be disruptive, you are not paying attention
I am just following the example that they gave me to show that things are in fact more complicated to what they are suggesting. To be clear, in the example, the AGI looks for a way to kill humans using nanotech but it first needs to build those nanotech factories
I’m confused—didn’t OP just say they don’t expect nanotechnology to be solvable , even with AGI? If so, than you seem to be assuming the crux in your question…
To clarify, I do think that creating nanobots is solvable. That is one thing: making factories, making designs that kill humans, deploying those nanobots, doing everything without raising any alarms and at risk close to zero, is, in my opinion, impossible.
I want to remark that people keep using the argument of nanotechnology totally uncritically, as it were the magical solution that makes an AGI take over the world in two weeks. They are not really considering the gears inside that part of the model
If OP doesn’t think nanotech is solvable in principle, I’m not sure where to take the conversation, since we already have an existence proof (i.e. biology). If they object to specific nanotech capabilities that aren’t extant in existing nanotech but aren’t ruled out by the laws of physics, that requires a justification.