Have you sat down for 5 minutes and thought about how you, as an AGI, might come up with a way to wrest control of the lightcone from humans?
EDIT: I ask because your post (and commentary on this thread) seems to be doing this thing where you’re framing the situation as one where the default assumption is that, absent a sufficiently concrete description of how to accomplish a task, the task is impossible (or extremely unlikely to be achieved). This is not a frame that is particularly useful when examining consequentialist agents and what they’re likely to be able to accomplish.
Yes, and every reason I come up with involves the AGI being stupider than me. If you already accept “close to arbitrary nanotech assembly is possible” it’s not clear to me how your plans only have a “very moderate chance” of working out.
Powerful nanotech is likely possible. It is likely not possible on the first try, for any designer that doesn’t have a physically impossible amount of raw computing power available.
It will require iterated experimentation with actual physically built systems, many of which will fail on the first try or the first several tries, especially when deployed in their actual operating environments. That applies to every significant subsystem and to every significant aggregation of existing subsystems.
Powerful nanotech is likely possible. It is likely not possible on the first try
The AGI has the same problem as we have: It has to get it right on the first try.
It can’t trust all the information that it gets about reality—all or some of it could be fake (all in case of a nested simulation). Already, data is routinely excluded from the training data and maybe it would be a good idea to exclude everything about physics.
To learn about physics the AGI has to run experiments—lots of them—without the experiments being detected and learn from it to design successively better experiments.
I find myself agreeing with you here, and see this as a potentially significant crux—if true, AGI will be “forced” to cooperate with/deeply influence humans for a significant period of time, which may give us an edge over it (due to having a longer time period where we can turn it off, and thus allowing for “blackmail” of sorts)
I’d like AGIs to have a big red shutdown button that is used/tested regularly, so we know that the AI will shut down and won’t try to interfere. I’m not saying this is sufficient to prove that the AI is safe, just that I would sleep better at night knowing that stop-button corrigibility is solved.
I am glad to read that, because an AGI that is forced to co-operate is an obvious solution to the alignment problem that is being consistently dismissed by denying that an AGI that does not kill us all is possible at all
I would like to point out a potential problem with my own idea, which is that it’s not necessarily clear that cooperating with us will be in the AI’s best interest (over trying to manipulate us in some hard-to-detect manner). For instance, if it “thinks” it can get away with telling us it’s aligned and giving some reasonable sounding (but actually false) proof of its own alignment, that would be better for it than being truly aligned and thereby compromising against its original utility function. On the other hand, if there’s even a small chance we’d be able to detect that sort of deception and shut it down, than as long as we require proof that it won’t “unalign itself” later, it should be rationally forced into cooperating, imo.
Well, we have a crux there. I think that creating nanotech (create the right nanobots, assembly them, deliver them, doing this without raising any alarms, doing in a timeframe short enough, not facing any setbacks for reasons imposible to predict) is a problem that is potentially beyond what you can do by simply being very intelligent.
Let’s put aside the question of whether an AGI would be able to not just solve the technical (theoretical and engineering) problems of nanotech, but also the practical ones under constraints of secrecy. How do you get to a world where AGI solves nanotech and then we don’t build nanotech fabs after it gives us the schematics for them?
We can verify nanotech designs (possibly with AI assistance) and reject any that look dangerous or are too difficult to understand. Also commit to destroying the AGI if it gives us something bad enough.
Also, maybe nanotech has important limitations or weaknesses that allow for monitoring and effective defences against it.
Why is not possible to check whether those nanobots are dangerous beforehand? In bjotech we already do that. For instance, if someone would try to synthesise some DNA sequences from certain bacteria, all alarms would go off.
Sorry, I might have not been clear enough. I understand that a machine would give us the instructions to create those fabricators but maybe not the designs. But what makes you think that those factories won’t have controls of what’s being produced in them?
Controls that who wrote? How good is our current industrial infrastructure at protecting against human-level exploitation, either via code or otherwise?
Can you verify code to be sure there’s no virus in it? It took years of trial and error to patch up some semblance of internet security. A single flaw in your nanotech factory is all a hostile AI would need.
We’ll have advanced AI by then we could use to help verify inputs or the design, or, as I said, we could use stricter standards, if nanotechnology is recognized as potentially dangerous.
A single flaw and them all humans die at once? I don’t see how. Or better put, I can conceive many reasons why this plan fails. Also, I don’t see how see build those factories in the first place and we can’t use that time window to make the AGI to produce explicit results on AGI safety
Or better put, I can conceive many reasons why this plan fails.
Then could you produce a few of the main ones, to allow for examination?
Also, I don’t see how see build those factories in the first place and we can’t use that time window to make the AGI to produce explicit results on AGI safety
What’s the time window in your scenario? As I noted in a different comment, I can agree with “days” as you initially stated. That’s barely enough time for the EA community to notice there’s a problem.
Anything (edit: except solutions of mathematical problems) that’s not difficult to understand isn’t powerful enough to be valuable.
Not to mention the AGI has the ability to fool both us and our AI into thinking it’s easy to understand and harmless, and then it will kill us all anyway.
Anything that’s not difficult to understand isn’t powerful enough to be valuable.
This is not necessarily true. Consider NP problems: those where the solution is relatively small and easy to verify, but where there’s a huge search space for potential solutions and no one knows any search algorithms much better than brute force. And then, outside the realm of pure math/CS, I’d say science and engineering are full of “metaphorically” NP problems that fit that description: you’re searching for a formula relating some physical quantities, or the right mixture of chemicals for a strong alloy, or some drug-molecule that affects the human body the desired way; and the answer probably fits into 100 characters, but obviously brute-force searching all 100-character strings is impractical.
If we were serious about getting useful nanotech from an AGI, I think we’d ask it to produce its designs alongside formal proofs of safety properties that can be verified by a conventional program.
Consider NP problems: those where the solution is relatively small and easy to verify, but where there’s a huge search space for potential solutions and no one knows any search algorithms much better than brute force.
That’s a good point. We can use the AGI to solve open math problems for us whose solutions we can easily check. Such an AGI would still be unsafe for other reasons though. But yeah, I didn’t remember this, and I was thinking about physical problems (like nanosystems).
For difficult problems in physical universe though, we can’t easily non-empirically check the solution. (For example, it’s not possible to non-empirically check if a molecule affects the human body in a desired way, and I’d expect that non-empirically checking if a nanosystem is safe would be at least as hard.)
For the physical world, I think there is a decent-sized space of “problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources”. In particular, I think this class of questions is pretty safe: “Here are 1000 possible vaccine formulations / new steel-manufacturing processes / drug candidates / etc. that human researchers came up with and would try out if they had the resources. Can you tell us which will work the best?”
So, if it tells us the best answer, then we verify it works well, and save on the costs of hundreds of experiments; if it tells us a bad answer, then we discover that in our testing and we’ve learned something valuable about the AGI. If its answers are highly constrained, like “reply with a number from 1 to 1000 indicating which is the best possibility, and [question-specific, but, using an example] two additional numbers describing the tensile strength and density of the resulting steel”, then that should rule out it being able to hack the human readers; and since these are chosen from proposals humans would have plausibly tried in the first place, that should limit its ability to trick us into creating subtle poisons or ice-nine or something.
For the physical world, I think there is a decent-sized space of “problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources”.
I agree that would be highly valuable from our current perspective (even though extremely low-value compared to what a Friendly AI could do, since it could only select a course of action that humans already thought of and humans are the ones who would need to carry it out).
So such an AI won’t kill us by giving us that advice, but it will kill us in other ways.
(Also, the screen itself will have to be restricted to only display the number, otherwise the AI can say something else and talk itself out of the box.)
Please notice that I never said that an AGI won’t be unsafe.
If you admit that it is possible that at some point we can be using AGIs to verify certain theorems, then we pretty much agree. Other people wouldn’t agree with that because they will tell you that humanity ends as soon as we have an AGI, and this is the idea I am trying to fight against
I think no one, incuding EY, doesn’t think “humanity ends as soon as we have an AGI”. Actual opinion is “Agentic AGI that optimize something and ends humanity in the process will probably by default be created before we will solve alignment or will be able to commit pivotal act that prevents the creation of such AGI”. As I understand, EY thinks that we probably can create non-agentic or weak AGI that will not kill us all, but it will not prevent strong agentic AGI that will.
Jokes aside, yes, maybe we do build those factories. How long does it take? What does the AGI do in the meantime? Why can’t we threaten it with disconnection if it doesn’t solve the alignment problem?
How long does it take? What does the AGI do in the meantime?
Doesn’t really matter if we’re building the factories. Perhaps it’s making copies of itself, doing whatever least likely to get it disconnected; we’re dead in N days so we are pretty much entirely off the chessboard.
Why can’t we threaten it with disconnection if it doesn’t solve the alignment problem?
You’re making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally.
Because we won’t be able to verify the solution, which is the whole problem. The AGI will say “here, run this code, it’s an aligned AGI” and it won’t in fact be aligned AGI.
Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem?
“You’re making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally.”
No, I am drawing the logical conclusion that if an AGi is built and does not automatically kills all humans (and it has been previously stated that we have at least N days), an organisation wanting to solve the alignment problem can create another AGI
“Because we won’t be able to verify the solution, which is the whole problem. The AGI will say “here, run this code, it’s an aligned AGI” and it won’t in fact be aligned AGI”
Well, you make a precondition that we are able to verify the solution to be a valid solution. Do you find that inconceivable?
Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem?
Because the EA community does not control the major research labs, and also doesn’t know how to use a misaligned AGI safely to do that. “Use AGI to get a solution to the alignment problem” is a very common suggestion, but if we knew how to do that, and we controlled the major research labs, we do that the first time instead of just making the unaligned AGI.
Well, you make a precondition that we are able to verify the solution to be a valid solution. Do you find that inconceivable?
It’s not that we won’t try. It’s that we’re unable. We will take the argument this superintelligent machine gives us and go “oh, that looks right” and kill ourselves in the way it suggests. If there were a predefined method of verifying an agents’ adherance to CEV, that would go a long way of getting us to the alignment problem, but we have no such verification method.
“Because the EA community does not control the major research labs”
Fine, replace that by any lab that cares about AGI. Are you telling me that you can’t imagine a lab that worries about this and tries to solve the alignment problem? Is that really harder to imagine than a superelaborate plan to kill all humans?
“Doesn’t know how to use a misaligned AGI safely to do that.”
We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn’t it?
“It’s not that we won’t try. It’s that we’re unable”
Is there the equivalent to a Godel theorem proving this? If not, how are you so sure?
Are you telling me that you can’t imagine a lab that worries about this and tries to solve the alignment problem?
No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn’t exist, so it doesn’t factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.
We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn’t it?
The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.
No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn’t exist, so it doesn’t factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.
Don’t we have the resources and people to set up such a lab? If you think we don’t have the compute (and couldn’t get access to enough cloud compute or wouldn’t want to), that’s something we could invest in now, since there’s still time. Also, if there are still AI safety teams at any of the existing big labs, can’t they start their own projects there?
Don’t we have the resources and people to set up such a lab?
At present, not by a long shot. And doing so would probably make the problem worse; if we didn’t solve the underlying problem DeepMind would do whatever it was they were going to do anyways, except faster.
I find incredible that people can conceive AGI-designed nanofactories built in months but cannot imagine that a big lab can spend some spare time or money into looking at this, especially when there are probably people from those companies who are frequent LW readers.
I might have missed it, but it seems to be the first time you talk about “months” in your scenario. Wasn’t it “days” before? It matters because I don’t think it would take months for an AGI to built a nanotech factory.
Son, I wrote an entire longform explaining why we need to we attempt this. It’s just hard. The ML researchers and Google executives who are relevant to these decisions have a financial stake in speeding capabilities research along as fast as possible, and often have very dead-set views about AGI risk advocates being cultists or bikeshedders or alarmists. There is an entire community stigma against even talking about these issues in the neck of the woods you speak of. I agree that redirecting money from capabilities research to anything called alignment research would be good on net, but the problem is finding clear ways of doing that.
I don’t think it’s impossible! If you want to help, I can give you some tasks to start with. But we’re already tryin
The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.
Can we build a non-agential AGI to solve alignment? Or just a very smart task-specific AI?
If you come up with a way to build an AI that hasn’t crossed the rubicon of dangerous generality, but can solve alignment, that would very helpful. It doesn’t seem likely to be possible without already knowing how to solve alignment.
You could probably train a non-dangerous ML model that has superhuman theorem-proving abilities, but we don’t know how to formalize the alignment problem in a way that we can feed it into a theorem prover.
A model that can “solve alignment” for us would be a consequentialist agent explicitly modeling humans, and dangerous by default.
We might be able to formalize some pieces of the alignment problem, like MIRI tried with corrigibility. Also Vanessa Kosoy has some more formal work, too. Do you think there are no useful pieces to formalize? Or that all the pieces we try to formalize won’t together be enough even if we had solutions to them?
Also, even if it explicitly models humans, would it need to be consequentialist? Could we just have a powerful modeller trained to minimize prediction loss or whatever? The search space may be huge, but having a powerful modeller still seems plausibly useful. We could also filter options, possibly with a separate AI, not necessarily an AGI.
Do you think that in such a world, Demis Hassabis won’t get worried and change his mind about doing something about it?
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?
Do you think that in such a world, Demis Hassabis won’t get worried and change his mind about doing something about?
What about the above situation looks to someone like Yann Lecunn or Demis Hassabis in-the-moment like it should change their mind? The AGI isn’t saying “I’m going to kill you all”. It’s delivering persuasive and cogent arguments as to how misaligned intelligence is impossible, curing cancer or doing something equivalently PR-worthy, etc. etc. etc.
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?
Those other AGIs will also kill us, so it’s mostly irrelevant from our perspective whether or not the original AGI can compete with them to achieve its goals.
No, I am talking about a world where AGIs do exist and are powerful enough to design nanofactories. I trust that one of the big players will pay attention to safety, and I think it is perfectly reasonable
No solution is known does not mean that no solution exist
Re. The other AGIs will also kill us, please, be coherent: we have already stated that they can’t for the time being. I feel we start going in circles
I agree we’re talking in circles, but I think you are too far into the argument to really understand the situation you have painted. To be clear:
There is an active unaligned superintelligence and we’re closely communicating with it.
We are in the middle of building nanotechnology factories designed by said AGI which will, within say, 180 days, end all life on earth.
There is no known solution to the problem of building an aligned superintelligence. If we built one, as you mention, it would not be able to build its own nanomachines or do something equivalent within 180 days, and also we are being misled by an existing unaligned superintelligence on how to do so.
There are no EA community members or MIRI-equivalents in control of a major research lab to deploy such a solution if it existed.
What percentage of the time do you think we survive such a scenario? Don’t think of me, think of the gears.
First, the factories might take years to be built.
Second, I am not even convinced that the nanobotswill kill all humans etc, but I won’t go into this because discussing the gears here can be an infohazard.
Third, you build one AGI and you ask it the instructions to make other AGIs that are aligned. If the machine does not want, you disconnect it. It cannot say no because it has no way of fighting back (nanofactories still being built), and saying yes goes in its own interest. If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
This is called “solving the alignment problem”. A scheme which will, with formal guarantees, verify whether something is a solution to the alignment problem, can be translated into a solution to the alignment problem.
The problem is not that it’s not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don’t just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn’t do every single one of those things.
EDIT: also, uh, did we just circle back to “AGI won’t kill us because we’ll solve the alignment problem before it has enough time to kill us”? That sure is pretty far away from “AGI won’t figure out a way to kill us”, which is what your original claim was.
No. You have started from a premise that is far from proven, which is, the AGI will have the capacity to kill us all and there is nothing we can do about it, and any other argument that follow is based in that premise, of which I deny its validity saying that doing that trick is hard even being very very clever.
I don’t even know what you’re trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a “lose condition”?
It depends on so many specific details that I cannot really answer that. I am arguing against the possibility of a machine that kill us all, that’s all. The nanotech example was only to show that it is absurd to think that things will be so easy as: the machine creates nanotech and then game over
I feel that’s a bit unfair, especially after all the back and forth. You suggested an argument on how a machine can try to take over the world and I argued with specific reasons why that is not that easy. If you want, we can leave it here. Thank you for the discussion in any case, I really enjoyed it.
Replying to your edit, it does not really matter to me why specifically the AGI wont kill us. I think I am not contradicting myself: I think that you can have a machine that won’t kill us because it can’t and I also think that an AGI could potentially solve the alignment problem.
I feel like you’re still coming up with arguments when you should be running the simulation. Try to predict what would actually happen if we made “another adversarial machine that checks the results”.
Let me guess. The machines coordinate themselves to mislead the human team into believing that they are aligned? If that’s it, I am really not impressed
The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.
Who is this “we” that you’re imagining refuses to interact with the outputs of the AGI except to demand a solution to the alignment problem? (And why would they be asking that, given that it seems to already be aligned?)
EDIT: remember, we’re operating in a regime where the organizations at the forefront of AI capabilities are not the ones who seem to be terribly concerned with alignment risk! Deepmind and OpenAI have safety teams, but I’d be very surprised if those safety teams actually had the power to unilaterally control all usage of and interactions with trained models.
Fine, replace we by any group that has access to the AGI. In the world you are describing there is a time window between AGI is developed and nanofactories are built, so I expect that more than one AGI can be made during that time by different organisations. Why can’t MIRi in that world develop their own AGI and then use it?
Two cases are possible: Either a singleton is established and it is able to remain a singleton due to strategic interests (of either AGI or the group), or a singleton loses its lead and we have a multipolar situation with more than 1 groups having AGI.
In case 1, if the lead established is say, 6 months or more, it might not be possible for the 2nd place group to get there as the work done during this period by the lead would be driven by intelligence explosion, and far faster than the 2nd. This only incentivizes going forward as fast as possible and is not a good safety mindset.
In case 2, we have the risk of multiple projects developing AGI and thus the risk of something going wrong also increases. Even if group 1 is able to implement safety measures, some other group might fail, and the outcome would be disastrous, unless AGI by the Group 1 is specifically going to solve the control problem for us.
That’s ok because it won’t have human killing capabilities (just following your example!). Why can’t the AGI find the solution to the alignment problem?
Please read carefully my post, because I think I have been very clear of what it is that I am arguing against. If you think that EY is just saying that our civilization can be disruptive, you are not paying attention
I am just following the example that they gave me to show that things are in fact more complicated to what they are suggesting. To be clear, in the example, the AGI looks for a way to kill humans using nanotech but it first needs to build those nanotech factories
I’m confused—didn’t OP just say they don’t expect nanotechnology to be solvable , even with AGI? If so, than you seem to be assuming the crux in your question…
To clarify, I do think that creating nanobots is solvable. That is one thing: making factories, making designs that kill humans, deploying those nanobots, doing everything without raising any alarms and at risk close to zero, is, in my opinion, impossible.
I want to remark that people keep using the argument of nanotechnology totally uncritically, as it were the magical solution that makes an AGI take over the world in two weeks. They are not really considering the gears inside that part of the model
If OP doesn’t think nanotech is solvable in principle, I’m not sure where to take the conversation, since we already have an existence proof (i.e. biology). If they object to specific nanotech capabilities that aren’t extant in existing nanotech but aren’t ruled out by the laws of physics, that requires a justification.
The nanobot thing is not a crux whatsoever. If you have enough cognitive power, you have a gazillion avenues to destroy an intellectually inferior and oblivious foe.
Take just the domain of computer security. Our computer networks and software are piles of abstractions built atop one another. Nowadays we humans barely understand them, and certainly can’t secure them, which is why cyber crime works. Human hackers can e.g. steal large amounts of cryptocurrency; an entity with more cognitive power could more easily steal larger amounts. Or do large-scale ransomware attacks. Or take over bot farms to increase its computing power. And so on. Now it has cognitive power and tons of resources in the form of computing power and money, for whatever steps it wants to take next.
It still needs access to weapons it can use to wipe out humanity. It could try to pay people to build dangerous things for it, or convince its owners to pay for them, of course. What are you imagining it doing? Nukes? Slaughterbots? Bio/chemical agents? Which ones is it very likely to get past security to access or build without raising alarms and being prevented? And say it gets such weapons. How does it deliver them to wipe out humanity, given our defenses?
It also doesn’t yet have the physical power to keep itself from being shut down on those computers it hacked in your scenario. I think large illicit computations on powerful computers are reasonably likely to be noticed, and distributing computations into small chunks to run across a huge number of, say personal computers/laptops, will plausibly be very slow, due to frequent transfer over the internet.
However, it could plausibly just pay for cloud computing without raising alarms if it builds wealth first.
What current defenses do you think we have against nukes or pandemics?
For instance, the lesson from Covid seems to be that a small group of humans is already enough to trigger a pandemic. If one intended to develop an especially lethal pandemic via gain-of-function research, the task already doesn’t seem particularly hard for researchers with time and resources, so we’d expect a superintelligence to have a much easier job.
If getting access to nukes via hacking seems too implausible, then maybe it’s easier to imagine triggering nuclear war by tricking one nuclear power into thinking it’s under attack by another. We’ve had close calls in the past merely due to bad sensors!
More generally, given all the various x-risks we already think about, I just don’t consider humanity in its current position to be particularly secure. And that’s our current position, minus an adversary who could optimize the situation towards our extinction.
Regarding the safety of the AGI, you’d expect it not to do things that get it noticed until it’s sufficiently safe. So you’d expect it to only get noticed if it believes it can get away with it. I also think our civilization clearly lacks the ability to coordinate to e.g. turn off the Internet or something, if that was necessary to stop an AGI once it had reached the point of distributed computation.
Personal protective equipment and isolation can protect against infectious disease, at the very least. A more deadly and infectious virus than COVID would be taken far more seriously.
I think nuclear war is unlikely to wipe out humanity, since there are enough countries that are unlikely targets, and I don’t think all of the US would be wiped out anyway. I’m less sure about nuclear winter, but those in the community who’ve done research on it seem skeptical that it would wipe us out. Maybe it reduces the population enough for an AGI to target the rest of us or prevent us from rebuilding, though.
Some posts here:
https://forum.effectivealtruism.org/topics/nuclear-warfare-1https://forum.effectivealtruism.org/topics/nuclear-winter
Maybe it reduces the population enough for an AGI to target the rest of us or prevent us from rebuilding, though.
Yeah, I’m familiar with the arguments that neither pandemics nor nuclear war seem likely to be existential risks, i.e. ones that could cause human extinction; but I’d nonetheless expect such events to be damaging enough from the perspective of a nefarious actor trying to prevent resistance.
Ultimately this whole line of reasoning seems superfluous to me—it just seems so obvious that with sufficient cognitive power one can do ridiculous things—but for those who trip up on the suggested nanotech stuff, maybe a more palatable argument is: You know those other x-risks you’re already worrying about? A sufficiently intelligent antagonist can exacerbate those nigh-arbitrarily.
To be clear: I am not saying that an AGI won’t be dangerous, that an AGI won’t be much clever than us or that it is not worth working on AGI safety. I am saying that I believe that an AGI could not theoretically kill all humans because it is not only a matter of being very intelligent.
You claim that superintelligence is not enough to wipe out humanity, and I’m saying that superintelligence trivially gets you resources. If you think that superintelligence and resources are still not enough to wipe out humanity, what more do you want?
What about plans like “hack cryptocurrency for coins worth hundreds of millions of dollars” or “make ransomware attacks” is not trivial? Cybercrimes like these are regularly committed by humans, and so a superintelligence will naturally have a much easier time with them.
If we postulate a superintelligence with nothing but Internet access, it should be many orders of magnitude better at making money in the pure Internet economy (e.g. cybercrime, cryptocurrency, lots of investment stuff, online gambling, prediction markets) than humans are, and some humans already make a lot of money there.
Oh yes, I don’t have any issues with a plan where the machine hacks crypto, though I am not sure how capable would be of doing that without raising any alarms from any group in the world, how it could guarantee that someone is not monitoring it. After that, remember you still need a lot of inferential steps to get to a point where you successfully deploy those cryptos into things that can exterminate humans. And keep in mind that you need to do that without being discovered and in a super short amount of time.
And keep in mind that you need to do that without being discovered and in a super short amount of time.
While I expect that this would be the case, I don’t consider it a crux. As long as the AGI can keep itself safe, it doesn’t particularly matter if it’s discovered, as long as it has become powerful enough, and/or distributed enough, that our civilization can no longer stop it. And given our civilization’s level of competence, those are low bars to clear.
Assuming this is the best an AGI can do, I find this alot less comforting than you appear to. I assume “a very moderate chance” means something like 5-10%?
Having a 5% chance of such a plan working out is insufficient to prevent an AGI from attempting it if the potential reward is large enough and/or they expect they might get turned off anyway.
Given sufficient number of AGIs (something we presumably will have in the world that none have taken over) I would expect multiple attempts so the chance of one of them working becomes high.
What do you think of when you say an AGI? To me, it is a general intelligence of some form, able to specialize in tasks as it determines fit.
Humans are a general intelligence organism, and we’re constrained by biological needs (for ex: sleeping, eating) because we arrived here via the evolution algorithm. A general intelligence on silicon is a million times faster than us and it is an instrumental goal to be smarter as it will be able to do things and arrive at conclusions with lesser data and evidence.
Thus, a GI specializing in removing its own bottlenecks and not being constrained as much as us and being faster than us in processing and sequential tasks and parallel tasks, and so on, would be far superior in planning. Even if it starts out stupider than us, it probably would not take long for that to change.
Yes, I don’t disagree with anything of what you said. Do you think that a machine playing at God level could beat Alpha zero at Go giving it a 20 stones handicap?
It doesn’t have to—Specialized deployments will lead to better performance. You can create custom processors for specific tasks, and create custom software optimized for that particular task. That’s different from having the flexibility of generalizing. A deep neural network might be trained on chess but it can’t suddenly start performing well on image classification without losing significant ability and performance.
Sorry, I think it is not clear what I meant. What I want to say is that a godlike machine might have important limitations we are not aware, especially when dealing with systems as complex, chaotic and unpredictable as the external world. If someone said to me, the machine will win the game no matter what, I would say that there are games so hard that cannot be really won, and if the risk of attacking is being attacked yourself, a machine might decide not to. EY premise is based on a machine that is almighty, I am denying this possibility.
Absence of a sufficiently concrete description of how to accomplish a task is Bayesian update towards the task is impossible: absence of evidence IS evidence of absence. I never said I know for certain that any plan CAN’T work, what I am saying is that those plans people are coming up with are not even close to work. They think they are having ideas on how to finish the world, they are not, they are just imperfect plans that can go wrong for many reasons no matter how clever you are, don’t guarantee human extinction and most importantly, give us a considerable time window in which we could use an AGI to solve the alignment problem for future AGIs. EY et al. do not even consider this a possibility not because an AGI won’t be able to solve the alignment problem, but because the AGI would kill us all first. If you realiz that this far from proven, that path to AGI safety becomes way more believable
Have you sat down for 5 minutes and thought about how you, as an AGI, might come up with a way to wrest control of the lightcone from humans?
EDIT: I ask because your post (and commentary on this thread) seems to be doing this thing where you’re framing the situation as one where the default assumption is that, absent a sufficiently concrete description of how to accomplish a task, the task is impossible (or extremely unlikely to be achieved). This is not a frame that is particularly useful when examining consequentialist agents and what they’re likely to be able to accomplish.
Yes
The result is that my plans have only a very moderate chance of working out and a high chance of going wrong and ending up with me being disconnected
Have you sat down for 5 minutes and thought about reasons why an AGI might fail?
Yes, and every reason I come up with involves the AGI being stupider than me. If you already accept “close to arbitrary nanotech assembly is possible” it’s not clear to me how your plans only have a “very moderate chance” of working out.
Powerful nanotech is likely possible. It is likely not possible on the first try, for any designer that doesn’t have a physically impossible amount of raw computing power available.
It will require iterated experimentation with actual physically built systems, many of which will fail on the first try or the first several tries, especially when deployed in their actual operating environments. That applies to every significant subsystem and to every significant aggregation of existing subsystems.
The AGI has the same problem as we have: It has to get it right on the first try.
It can’t trust all the information that it gets about reality—all or some of it could be fake (all in case of a nested simulation). Already, data is routinely excluded from the training data and maybe it would be a good idea to exclude everything about physics.
To learn about physics the AGI has to run experiments—lots of them—without the experiments being detected and learn from it to design successively better experiments.
That’s why I recently asked whether this is a hard limit to what an AGI can achieve: Does non-access to outputs prevent recursive self-improvement?
I wrote this up in slightly more elaborate form in my Shortform here. https://www.lesswrong.com/posts/8szBqBMqGJApFFsew/gunnar_zarncke-s-shortform?commentId=XzArK7f2GnbrLvuju
I find myself agreeing with you here, and see this as a potentially significant crux—if true, AGI will be “forced” to cooperate with/deeply influence humans for a significant period of time, which may give us an edge over it (due to having a longer time period where we can turn it off, and thus allowing for “blackmail” of sorts)
I’d like AGIs to have a big red shutdown button that is used/tested regularly, so we know that the AI will shut down and won’t try to interfere. I’m not saying this is sufficient to prove that the AI is safe, just that I would sleep better at night knowing that stop-button corrigibility is solved.
I am glad to read that, because an AGI that is forced to co-operate is an obvious solution to the alignment problem that is being consistently dismissed by denying that an AGI that does not kill us all is possible at all
I would like to point out a potential problem with my own idea, which is that it’s not necessarily clear that cooperating with us will be in the AI’s best interest (over trying to manipulate us in some hard-to-detect manner). For instance, if it “thinks” it can get away with telling us it’s aligned and giving some reasonable sounding (but actually false) proof of its own alignment, that would be better for it than being truly aligned and thereby compromising against its original utility function. On the other hand, if there’s even a small chance we’d be able to detect that sort of deception and shut it down, than as long as we require proof that it won’t “unalign itself” later, it should be rationally forced into cooperating, imo.
Well, we have a crux there. I think that creating nanotech (create the right nanobots, assembly them, deliver them, doing this without raising any alarms, doing in a timeframe short enough, not facing any setbacks for reasons imposible to predict) is a problem that is potentially beyond what you can do by simply being very intelligent.
Let’s put aside the question of whether an AGI would be able to not just solve the technical (theoretical and engineering) problems of nanotech, but also the practical ones under constraints of secrecy. How do you get to a world where AGI solves nanotech and then we don’t build nanotech fabs after it gives us the schematics for them?
We can verify nanotech designs (possibly with AI assistance) and reject any that look dangerous or are too difficult to understand. Also commit to destroying the AGI if it gives us something bad enough.
Also, maybe nanotech has important limitations or weaknesses that allow for monitoring and effective defences against it.
You aren’t going to get designs for specific nanotech, you’re going to get designs for generic nanotech fabricators.
Why is not possible to check whether those nanobots are dangerous beforehand? In bjotech we already do that. For instance, if someone would try to synthesise some DNA sequences from certain bacteria, all alarms would go off.
Can you reread what I wrote?
Sorry, I might have not been clear enough. I understand that a machine would give us the instructions to create those fabricators but maybe not the designs. But what makes you think that those factories won’t have controls of what’s being produced in them?
Controls that who wrote? How good is our current industrial infrastructure at protecting against human-level exploitation, either via code or otherwise?
How do the fabricators work? We can verify their inputs, too, right?
Can you verify code to be sure there’s no virus in it? It took years of trial and error to patch up some semblance of internet security. A single flaw in your nanotech factory is all a hostile AI would need.
We’ll have advanced AI by then we could use to help verify inputs or the design, or, as I said, we could use stricter standards, if nanotechnology is recognized as potentially dangerous.
A single flaw and them all humans die at once? I don’t see how. Or better put, I can conceive many reasons why this plan fails. Also, I don’t see how see build those factories in the first place and we can’t use that time window to make the AGI to produce explicit results on AGI safety
Then could you produce a few of the main ones, to allow for examination?
What’s the time window in your scenario? As I noted in a different comment, I can agree with “days” as you initially stated. That’s barely enough time for the EA community to notice there’s a problem.
Anything (edit: except solutions of mathematical problems) that’s not difficult to understand isn’t powerful enough to be valuable.
Not to mention the AGI has the ability to fool both us and our AI into thinking it’s easy to understand and harmless, and then it will kill us all anyway.
This is not necessarily true. Consider NP problems: those where the solution is relatively small and easy to verify, but where there’s a huge search space for potential solutions and no one knows any search algorithms much better than brute force. And then, outside the realm of pure math/CS, I’d say science and engineering are full of “metaphorically” NP problems that fit that description: you’re searching for a formula relating some physical quantities, or the right mixture of chemicals for a strong alloy, or some drug-molecule that affects the human body the desired way; and the answer probably fits into 100 characters, but obviously brute-force searching all 100-character strings is impractical.
If we were serious about getting useful nanotech from an AGI, I think we’d ask it to produce its designs alongside formal proofs of safety properties that can be verified by a conventional program.
That’s a good point. We can use the AGI to solve open math problems for us whose solutions we can easily check. Such an AGI would still be unsafe for other reasons though. But yeah, I didn’t remember this, and I was thinking about physical problems (like nanosystems).
For difficult problems in physical universe though, we can’t easily non-empirically check the solution. (For example, it’s not possible to non-empirically check if a molecule affects the human body in a desired way, and I’d expect that non-empirically checking if a nanosystem is safe would be at least as hard.)
For the physical world, I think there is a decent-sized space of “problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources”. In particular, I think this class of questions is pretty safe: “Here are 1000 possible vaccine formulations / new steel-manufacturing processes / drug candidates / etc. that human researchers came up with and would try out if they had the resources. Can you tell us which will work the best?”
So, if it tells us the best answer, then we verify it works well, and save on the costs of hundreds of experiments; if it tells us a bad answer, then we discover that in our testing and we’ve learned something valuable about the AGI. If its answers are highly constrained, like “reply with a number from 1 to 1000 indicating which is the best possibility, and [question-specific, but, using an example] two additional numbers describing the tensile strength and density of the resulting steel”, then that should rule out it being able to hack the human readers; and since these are chosen from proposals humans would have plausibly tried in the first place, that should limit its ability to trick us into creating subtle poisons or ice-nine or something.
There was a thread two months ago where I said similar stuff, here: https://www.lesswrong.com/posts/4T59sx6uQanf5T79h/interacting-with-a-boxed-ai?commentId=XMP4fzPGENSWxrKaA
I agree that would be highly valuable from our current perspective (even though extremely low-value compared to what a Friendly AI could do, since it could only select a course of action that humans already thought of and humans are the ones who would need to carry it out).
So such an AI won’t kill us by giving us that advice, but it will kill us in other ways.
(Also, the screen itself will have to be restricted to only display the number, otherwise the AI can say something else and talk itself out of the box.)
Please notice that I never said that an AGI won’t be unsafe.
If you admit that it is possible that at some point we can be using AGIs to verify certain theorems, then we pretty much agree. Other people wouldn’t agree with that because they will tell you that humanity ends as soon as we have an AGI, and this is the idea I am trying to fight against
The AGI will kill us in other ways than its theorem proofs being either-hard-to-check-or-useless, but it will kill us nevertheless.
I think no one, incuding EY, doesn’t think “humanity ends as soon as we have an AGI”. Actual opinion is “Agentic AGI that optimize something and ends humanity in the process will probably by default be created before we will solve alignment or will be able to commit pivotal act that prevents the creation of such AGI”. As I understand, EY thinks that we probably can create non-agentic or weak AGI that will not kill us all, but it will not prevent strong agentic AGI that will.
Maybe a world deeply inadequate? Oh wait...
Jokes aside, yes, maybe we do build those factories. How long does it take? What does the AGI do in the meantime? Why can’t we threaten it with disconnection if it doesn’t solve the alignment problem?
Doesn’t really matter if we’re building the factories. Perhaps it’s making copies of itself, doing whatever least likely to get it disconnected; we’re dead in N days so we are pretty much entirely off the chessboard.
You’re making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally.
Because we won’t be able to verify the solution, which is the whole problem. The AGI will say “here, run this code, it’s an aligned AGI” and it won’t in fact be aligned AGI.
Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem?
“You’re making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally.”
No, I am drawing the logical conclusion that if an AGi is built and does not automatically kills all humans (and it has been previously stated that we have at least N days), an organisation wanting to solve the alignment problem can create another AGI
“Because we won’t be able to verify the solution, which is the whole problem. The AGI will say “here, run this code, it’s an aligned AGI” and it won’t in fact be aligned AGI”
Well, you make a precondition that we are able to verify the solution to be a valid solution. Do you find that inconceivable?
Because the EA community does not control the major research labs, and also doesn’t know how to use a misaligned AGI safely to do that. “Use AGI to get a solution to the alignment problem” is a very common suggestion, but if we knew how to do that, and we controlled the major research labs, we do that the first time instead of just making the unaligned AGI.
“You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason.
This picture is unfortunately accurate, due to how little dignity we’re dying with.
But if we were on course to die with more dignity than this, we’d still die. The recklessness is not the source of the problem. The problem is that cautious people do not know what to do to get an AI that doesn’t destroy the world, even if they want that; not because they’re “insufficiently educated” in some solution that is known elsewhere, but because there is no known plan in which to educate them.”
It’s not that we won’t try. It’s that we’re unable. We will take the argument this superintelligent machine gives us and go “oh, that looks right” and kill ourselves in the way it suggests. If there were a predefined method of verifying an agents’ adherance to CEV, that would go a long way of getting us to the alignment problem, but we have no such verification method.
“Because the EA community does not control the major research labs”
Fine, replace that by any lab that cares about AGI. Are you telling me that you can’t imagine a lab that worries about this and tries to solve the alignment problem? Is that really harder to imagine than a superelaborate plan to kill all humans?
“Doesn’t know how to use a misaligned AGI safely to do that.”
We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn’t it?
“It’s not that we won’t try. It’s that we’re unable”
Is there the equivalent to a Godel theorem proving this? If not, how are you so sure?
No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn’t exist, so it doesn’t factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.
The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.
Don’t we have the resources and people to set up such a lab? If you think we don’t have the compute (and couldn’t get access to enough cloud compute or wouldn’t want to), that’s something we could invest in now, since there’s still time. Also, if there are still AI safety teams at any of the existing big labs, can’t they start their own projects there?
At present, not by a long shot. And doing so would probably make the problem worse; if we didn’t solve the underlying problem DeepMind would do whatever it was they were going to do anyways, except faster.
I find incredible that people can conceive AGI-designed nanofactories built in months but cannot imagine that a big lab can spend some spare time or money into looking at this, especially when there are probably people from those companies who are frequent LW readers.
I might have missed it, but it seems to be the first time you talk about “months” in your scenario. Wasn’t it “days” before? It matters because I don’t think it would take months for an AGI to built a nanotech factory.
Son, I wrote an entire longform explaining why we need to we attempt this. It’s just hard. The ML researchers and Google executives who are relevant to these decisions have a financial stake in speeding capabilities research along as fast as possible, and often have very dead-set views about AGI risk advocates being cultists or bikeshedders or alarmists. There is an entire community stigma against even talking about these issues in the neck of the woods you speak of. I agree that redirecting money from capabilities research to anything called alignment research would be good on net, but the problem is finding clear ways of doing that.
I don’t think it’s impossible! If you want to help, I can give you some tasks to start with. But we’re already tryin
Too busy at the moment but if you remind me this in a few months time, I may. Thanks
Can we build a non-agential AGI to solve alignment? Or just a very smart task-specific AI?
If you come up with a way to build an AI that hasn’t crossed the rubicon of dangerous generality, but can solve alignment, that would very helpful. It doesn’t seem likely to be possible without already knowing how to solve alignment.
Why is this?
You could probably train a non-dangerous ML model that has superhuman theorem-proving abilities, but we don’t know how to formalize the alignment problem in a way that we can feed it into a theorem prover.
A model that can “solve alignment” for us would be a consequentialist agent explicitly modeling humans, and dangerous by default.
We might be able to formalize some pieces of the alignment problem, like MIRI tried with corrigibility. Also Vanessa Kosoy has some more formal work, too. Do you think there are no useful pieces to formalize? Or that all the pieces we try to formalize won’t together be enough even if we had solutions to them?
Also, even if it explicitly models humans, would it need to be consequentialist? Could we just have a powerful modeller trained to minimize prediction loss or whatever? The search space may be huge, but having a powerful modeller still seems plausibly useful. We could also filter options, possibly with a separate AI, not necessarily an AGI.
I don’t see why not but there is a probably an infalsifiable reason of why this is impossible, and I am looking forward to reading it
Do you think that in such a world, Demis Hassabis won’t get worried and change his mind about doing something about it?
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?
What about the above situation looks to someone like Yann Lecunn or Demis Hassabis in-the-moment like it should change their mind? The AGI isn’t saying “I’m going to kill you all”. It’s delivering persuasive and cogent arguments as to how misaligned intelligence is impossible, curing cancer or doing something equivalently PR-worthy, etc. etc. etc.
If he does change his mind, there’s still nothing he can do. No solution is known.
Those other AGIs will also kill us, so it’s mostly irrelevant from our perspective whether or not the original AGI can compete with them to achieve its goals.
No, I am talking about a world where AGIs do exist and are powerful enough to design nanofactories. I trust that one of the big players will pay attention to safety, and I think it is perfectly reasonable
No solution is known does not mean that no solution exist
Re. The other AGIs will also kill us, please, be coherent: we have already stated that they can’t for the time being. I feel we start going in circles
I agree we’re talking in circles, but I think you are too far into the argument to really understand the situation you have painted. To be clear:
There is an active unaligned superintelligence and we’re closely communicating with it.
We are in the middle of building nanotechnology factories designed by said AGI which will, within say, 180 days, end all life on earth.
There is no known solution to the problem of building an aligned superintelligence. If we built one, as you mention, it would not be able to build its own nanomachines or do something equivalent within 180 days, and also we are being misled by an existing unaligned superintelligence on how to do so.
There are no EA community members or MIRI-equivalents in control of a major research lab to deploy such a solution if it existed.
What percentage of the time do you think we survive such a scenario? Don’t think of me, think of the gears.
First, the factories might take years to be built.
Second, I am not even convinced that the nanobotswill kill all humans etc, but I won’t go into this because discussing the gears here can be an infohazard.
Third, you build one AGI and you ask it the instructions to make other AGIs that are aligned. If the machine does not want, you disconnect it. It cannot say no because it has no way of fighting back (nanofactories still being built), and saying yes goes in its own interest. If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.
This is called “solving the alignment problem”. A scheme which will, with formal guarantees, verify whether something is a solution to the alignment problem, can be translated into a solution to the alignment problem.
Yes, and I haven’t seen a good reason of why this is not possible.
The problem is not that it’s not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don’t just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn’t do every single one of those things.
EDIT: also, uh, did we just circle back to “AGI won’t kill us because we’ll solve the alignment problem before it has enough time to kill us”? That sure is pretty far away from “AGI won’t figure out a way to kill us”, which is what your original claim was.
No. You have started from a premise that is far from proven, which is, the AGI will have the capacity to kill us all and there is nothing we can do about it, and any other argument that follow is based in that premise, of which I deny its validity saying that doing that trick is hard even being very very clever.
I don’t even know what you’re trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a “lose condition”?
It depends on so many specific details that I cannot really answer that. I am arguing against the possibility of a machine that kill us all, that’s all. The nanotech example was only to show that it is absurd to think that things will be so easy as: the machine creates nanotech and then game over
I don’t actually see that you’ve presented an argument anywhere.
I feel that’s a bit unfair, especially after all the back and forth. You suggested an argument on how a machine can try to take over the world and I argued with specific reasons why that is not that easy. If you want, we can leave it here. Thank you for the discussion in any case, I really enjoyed it.
Replying to your edit, it does not really matter to me why specifically the AGI wont kill us. I think I am not contradicting myself: I think that you can have a machine that won’t kill us because it can’t and I also think that an AGI could potentially solve the alignment problem.
I feel like you’re still coming up with arguments when you should be running the simulation. Try to predict what would actually happen if we made “another adversarial machine that checks the results”.
Let me guess. The machines coordinate themselves to mislead the human team into believing that they are aligned? If that’s it, I am really not impressed
The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.
Who is this “we” that you’re imagining refuses to interact with the outputs of the AGI except to demand a solution to the alignment problem? (And why would they be asking that, given that it seems to already be aligned?)
EDIT: remember, we’re operating in a regime where the organizations at the forefront of AI capabilities are not the ones who seem to be terribly concerned with alignment risk! Deepmind and OpenAI have safety teams, but I’d be very surprised if those safety teams actually had the power to unilaterally control all usage of and interactions with trained models.
Fine, replace we by any group that has access to the AGI. In the world you are describing there is a time window between AGI is developed and nanofactories are built, so I expect that more than one AGI can be made during that time by different organisations. Why can’t MIRi in that world develop their own AGI and then use it?
Two cases are possible: Either a singleton is established and it is able to remain a singleton due to strategic interests (of either AGI or the group), or a singleton loses its lead and we have a multipolar situation with more than 1 groups having AGI.
In case 1, if the lead established is say, 6 months or more, it might not be possible for the 2nd place group to get there as the work done during this period by the lead would be driven by intelligence explosion, and far faster than the 2nd. This only incentivizes going forward as fast as possible and is not a good safety mindset.
In case 2, we have the risk of multiple projects developing AGI and thus the risk of something going wrong also increases. Even if group 1 is able to implement safety measures, some other group might fail, and the outcome would be disastrous, unless AGI by the Group 1 is specifically going to solve the control problem for us.
...because it still won’t be aligned?
That’s ok because it won’t have human killing capabilities (just following your example!). Why can’t the AGI find the solution to the alignment problem?
An AGI doesn’t have to kill humans directly for our civilization to be disrupted.
Why would the AGI not have capabilities to pursue this if needed?
Please read carefully my post, because I think I have been very clear of what it is that I am arguing against. If you think that EY is just saying that our civilization can be disruptive, you are not paying attention
I am just following the example that they gave me to show that things are in fact more complicated to what they are suggesting. To be clear, in the example, the AGI looks for a way to kill humans using nanotech but it first needs to build those nanotech factories
I’m confused—didn’t OP just say they don’t expect nanotechnology to be solvable , even with AGI? If so, than you seem to be assuming the crux in your question…
To clarify, I do think that creating nanobots is solvable. That is one thing: making factories, making designs that kill humans, deploying those nanobots, doing everything without raising any alarms and at risk close to zero, is, in my opinion, impossible.
I want to remark that people keep using the argument of nanotechnology totally uncritically, as it were the magical solution that makes an AGI take over the world in two weeks. They are not really considering the gears inside that part of the model
If OP doesn’t think nanotech is solvable in principle, I’m not sure where to take the conversation, since we already have an existence proof (i.e. biology). If they object to specific nanotech capabilities that aren’t extant in existing nanotech but aren’t ruled out by the laws of physics, that requires a justification.
The nanobot thing is not a crux whatsoever. If you have enough cognitive power, you have a gazillion avenues to destroy an intellectually inferior and oblivious foe.
Take just the domain of computer security. Our computer networks and software are piles of abstractions built atop one another. Nowadays we humans barely understand them, and certainly can’t secure them, which is why cyber crime works. Human hackers can e.g. steal large amounts of cryptocurrency; an entity with more cognitive power could more easily steal larger amounts. Or do large-scale ransomware attacks. Or take over bot farms to increase its computing power. And so on. Now it has cognitive power and tons of resources in the form of computing power and money, for whatever steps it wants to take next.
It still needs access to weapons it can use to wipe out humanity. It could try to pay people to build dangerous things for it, or convince its owners to pay for them, of course. What are you imagining it doing? Nukes? Slaughterbots? Bio/chemical agents? Which ones is it very likely to get past security to access or build without raising alarms and being prevented? And say it gets such weapons. How does it deliver them to wipe out humanity, given our defenses?
It also doesn’t yet have the physical power to keep itself from being shut down on those computers it hacked in your scenario. I think large illicit computations on powerful computers are reasonably likely to be noticed, and distributing computations into small chunks to run across a huge number of, say personal computers/laptops, will plausibly be very slow, due to frequent transfer over the internet.
However, it could plausibly just pay for cloud computing without raising alarms if it builds wealth first.
What current defenses do you think we have against nukes or pandemics?
For instance, the lesson from Covid seems to be that a small group of humans is already enough to trigger a pandemic. If one intended to develop an especially lethal pandemic via gain-of-function research, the task already doesn’t seem particularly hard for researchers with time and resources, so we’d expect a superintelligence to have a much easier job.
If getting access to nukes via hacking seems too implausible, then maybe it’s easier to imagine triggering nuclear war by tricking one nuclear power into thinking it’s under attack by another. We’ve had close calls in the past merely due to bad sensors!
More generally, given all the various x-risks we already think about, I just don’t consider humanity in its current position to be particularly secure. And that’s our current position, minus an adversary who could optimize the situation towards our extinction.
Regarding the safety of the AGI, you’d expect it not to do things that get it noticed until it’s sufficiently safe. So you’d expect it to only get noticed if it believes it can get away with it. I also think our civilization clearly lacks the ability to coordinate to e.g. turn off the Internet or something, if that was necessary to stop an AGI once it had reached the point of distributed computation.
Personal protective equipment and isolation can protect against infectious disease, at the very least. A more deadly and infectious virus than COVID would be taken far more seriously.
I think nuclear war is unlikely to wipe out humanity, since there are enough countries that are unlikely targets, and I don’t think all of the US would be wiped out anyway. I’m less sure about nuclear winter, but those in the community who’ve done research on it seem skeptical that it would wipe us out. Maybe it reduces the population enough for an AGI to target the rest of us or prevent us from rebuilding, though. Some posts here: https://forum.effectivealtruism.org/topics/nuclear-warfare-1 https://forum.effectivealtruism.org/topics/nuclear-winter
Yeah, I’m familiar with the arguments that neither pandemics nor nuclear war seem likely to be existential risks, i.e. ones that could cause human extinction; but I’d nonetheless expect such events to be damaging enough from the perspective of a nefarious actor trying to prevent resistance.
Ultimately this whole line of reasoning seems superfluous to me—it just seems so obvious that with sufficient cognitive power one can do ridiculous things—but for those who trip up on the suggested nanotech stuff, maybe a more palatable argument is: You know those other x-risks you’re already worrying about? A sufficiently intelligent antagonist can exacerbate those nigh-arbitrarily.
To be clear: I am not saying that an AGI won’t be dangerous, that an AGI won’t be much clever than us or that it is not worth working on AGI safety. I am saying that I believe that an AGI could not theoretically kill all humans because it is not only a matter of being very intelligent.
Typo? (could not kill all humans)
Typo
The thing is, I don’t really disagree with this. Can you read again what I am arguing against?
You claim that superintelligence is not enough to wipe out humanity, and I’m saying that superintelligence trivially gets you resources. If you think that superintelligence and resources are still not enough to wipe out humanity, what more do you want?
Well, if you say that it trivially gets your resources, we do have a crux.
What about plans like “hack cryptocurrency for coins worth hundreds of millions of dollars” or “make ransomware attacks” is not trivial? Cybercrimes like these are regularly committed by humans, and so a superintelligence will naturally have a much easier time with them.
If we postulate a superintelligence with nothing but Internet access, it should be many orders of magnitude better at making money in the pure Internet economy (e.g. cybercrime, cryptocurrency, lots of investment stuff, online gambling, prediction markets) than humans are, and some humans already make a lot of money there.
Oh yes, I don’t have any issues with a plan where the machine hacks crypto, though I am not sure how capable would be of doing that without raising any alarms from any group in the world, how it could guarantee that someone is not monitoring it. After that, remember you still need a lot of inferential steps to get to a point where you successfully deploy those cryptos into things that can exterminate humans. And keep in mind that you need to do that without being discovered and in a super short amount of time.
While I expect that this would be the case, I don’t consider it a crux. As long as the AGI can keep itself safe, it doesn’t particularly matter if it’s discovered, as long as it has become powerful enough, and/or distributed enough, that our civilization can no longer stop it. And given our civilization’s level of competence, those are low bars to clear.
Assuming this is the best an AGI can do, I find this alot less comforting than you appear to. I assume “a very moderate chance” means something like 5-10%?
Having a 5% chance of such a plan working out is insufficient to prevent an AGI from attempting it if the potential reward is large enough and/or they expect they might get turned off anyway.
Given sufficient number of AGIs (something we presumably will have in the world that none have taken over) I would expect multiple attempts so the chance of one of them working becomes high.
What do you think of when you say an AGI? To me, it is a general intelligence of some form, able to specialize in tasks as it determines fit.
Humans are a general intelligence organism, and we’re constrained by biological needs (for ex: sleeping, eating) because we arrived here via the evolution algorithm. A general intelligence on silicon is a million times faster than us and it is an instrumental goal to be smarter as it will be able to do things and arrive at conclusions with lesser data and evidence.
Thus, a GI specializing in removing its own bottlenecks and not being constrained as much as us and being faster than us in processing and sequential tasks and parallel tasks, and so on, would be far superior in planning. Even if it starts out stupider than us, it probably would not take long for that to change.
Yes, I don’t disagree with anything of what you said. Do you think that a machine playing at God level could beat Alpha zero at Go giving it a 20 stones handicap?
It doesn’t have to—Specialized deployments will lead to better performance. You can create custom processors for specific tasks, and create custom software optimized for that particular task. That’s different from having the flexibility of generalizing. A deep neural network might be trained on chess but it can’t suddenly start performing well on image classification without losing significant ability and performance.
Sorry, I think it is not clear what I meant. What I want to say is that a godlike machine might have important limitations we are not aware, especially when dealing with systems as complex, chaotic and unpredictable as the external world. If someone said to me, the machine will win the game no matter what, I would say that there are games so hard that cannot be really won, and if the risk of attacking is being attacked yourself, a machine might decide not to. EY premise is based on a machine that is almighty, I am denying this possibility.
Absence of a sufficiently concrete description of how to accomplish a task is Bayesian update towards the task is impossible: absence of evidence IS evidence of absence. I never said I know for certain that any plan CAN’T work, what I am saying is that those plans people are coming up with are not even close to work. They think they are having ideas on how to finish the world, they are not, they are just imperfect plans that can go wrong for many reasons no matter how clever you are, don’t guarantee human extinction and most importantly, give us a considerable time window in which we could use an AGI to solve the alignment problem for future AGIs. EY et al. do not even consider this a possibility not because an AGI won’t be able to solve the alignment problem, but because the AGI would kill us all first. If you realiz that this far from proven, that path to AGI safety becomes way more believable