Hi, I’m the user who asked this question. Thank you for responding!
I see your point about how an AGI would intentionally destroy humanity versus engineered bugs that only wipe us out “by accident”, but that’s conditional on the AGI having “destroy humanity” as a subgoal. Most likely, a typical AGI will have some mundane, neutral-to-benevolent goal like “maximize profit by running this steel factory and selling steel”. Maybe the AGI can achieve that by taking over an iron mine somewhere, or taking over a country (or the world) and enslaving its citizens, or even wiping out humanity. In general, my guess is that the AGI will try to do the least costly/risky thing needed to achieve its goal (maximizing profit), and (setting aside that if all of humanity were extinct, the AGI would have no one to sell steel to) wiping out humanity is the most expensive of these options and the AGI would likely get itself destroyed while trying to do that. So I think that “enslave a large portion of humanity and export cheap steel at a hefty profit” is a subgoal that this AGI would likely have, but destroying humanity is not.
It depends on the use case—a misaligned AGI in charge of the U.S. Armed Forces could end up starting a nuclear war—but given how careful the U.S. government has been about avoiding nuclear war, I think they’d insist on an AGI being very aligned with their interests before putting it in charge of something so high stakes.
Also, I suspect that some militaries (like North Korea’s) might be developing bioweapons and spending 1 to 100% as much on it annually as OpenAI and DeepMind spend on AGI; we just don’t know about it.
Based on your AGI-bioweapon analogy, I suspect that AGI is a greater hazard than bioweapons, but not by quite as much as your argument implies. While few well-resourced actors are interested in using bioweapons, a who’s who of corporations, states, and NGOs will be interested in using AGI. And AGIs can adopt dangerous subgoals for a wide range of goals (especially resource extraction), whereas bioweapons can basically only kill large groups of people.
[W]iping out humanity is the most expensive of these options and the AGI would likely get itself destroyed while trying to do that[.]
It would be pretty easy and cheap for something much smarter than a human to kill all humans. The classic scenario is:
A. [...] The notion of a ‘superintelligence’ is not that it sits around in Goldman Sachs’s basement trading stocks for its corporate masters. The concrete illustration I often use is that a superintelligence asks itself what the fastest possible route is to increasing its real-world power, and then, rather than bothering with the digital counters that humans call money, the superintelligence solves the protein structure prediction problem, emails some DNA sequences to online peptide synthesis labs, and gets back a batch of proteins which it can mix together to create an acoustically controlled equivalent of an artificial ribosome which it can use to make second-stage nanotechnology which manufactures third-stage nanotechnology which manufactures diamondoid molecular nanotechnology and then… well, it doesn’t really matter from our perspective what comes after that, because from a human perspective any technology more advanced than molecular nanotech is just overkill. A superintelligence with molecular nanotech does not wait for you to buy things from it in order for it to acquire money. It just moves atoms around into whatever molecular structures or large-scale structures it wants.
Q. How would it get the energy to move those atoms, if not by buying electricity from existing power plants? Solar power?
A. Indeed, one popular speculation is that optimal use of a star system’s resources is to disassemble local gas giants (Jupiter in our case) for the raw materials to build a Dyson Sphere, an enclosure that captures all of a star’s energy output. This does not involve buying solar panels from human manufacturers, rather it involves self-replicating machinery which builds copies of itself on a rapid exponential curve -
If the smarter-than-human system doesn’t initially have Internet access, it will probably be able to get such access either by manipulating humans, or by exploiting the physical world in unanticipated ways (cf. Bird and Layzell 2002).
But also, if enough people have AGI systems it’s not as though no one will ever hook it up to the Internet, any more than you could give a nuke to every human on Earth and expect no one to ever use theirs.
[...] Killing all humans is the obvious, probably resource-minimal measure to prevent those humans from building another AGI inside the solar system, which could be genuinely problematic. The cost of a few micrograms of botulinum per human is really not that high and you get to reuse the diamondoid bacteria afterwards.
[… I]n my lower-bound concretely-visualized strategy for how I would do it, the AI either proliferates or activates already-proliferated tiny diamondoid bacteria and everybody immediately falls over dead during the same 1-second period, which minimizes the tiny probability of any unforeseen disruptions that could be caused by a human responding to a visible attack via some avenue that had not left any shadow on the Internet, previously scanned parts of the physical world, or other things the AI could look at. [...]
Are you assuming that early AGI systems won’t be much smarter than a human?
Most likely, a typical AGI will have some mundane, neutral-to-benevolent goal like “maximize profit by running this steel factory and selling steel”.
I don’t think that goal is “neutral-to-benevolent”, but I also don’t think any early AGI systems will have goals remotely like that. Two reasons for that:
We have no idea how to align AI so that it reliably pursues any intended goal in the physical world; and we aren’t on track to figuring that out before AGI is here. “Maximize profit by running this steel factory and selling steel” might be a goal the human operators have for the system; but the actual goal the system ends up optimizing will be something very different, “whatever goal (and overall model) happened to perform well in training, after a blind gradient-descent-ish search for goals (and overall models)”.
If you can reliably instill an ultimate goal like “maximize profit by running this steel factory and selling steel” into an AGI system, the you’ve already mostly solved the alignment problem and eliminated most of the risk.
A more minor objection to this visualization: By default, I expect AGI to vastly exceed human intelligence and destroy the world long before it’s being deployed in commercial applications. Instead, I’d expect early-stage research AI to destroy the world.
There have been a lot of words written about how and why almost any conceivable goal, even a mundane one like “improve efficiency of a steel plant”, carelessly specified, can easily result in a hostile AGI. The basic outline of these arguments usually goes something like:
The AGI wants to do what you told it (“make more steel”), and will optimize very hard for making as much steel as possible.
It also understands human motivations and knows that humans don’t actually want as much steel as it is going to make. But note carefully that it wasn’t aligned to respect human motivations, it was aligned to make steel. It’s understanding of human motivations is part of its understanding of its environment, in the same way as its understanding of metallurgy. It has no interest in doing what humans would want it to do because it hasn’t been designed to do that.
Because it knows that humans don’t want as much steel as it is going to make, it will correctly conclude that humans will try to shut it off as soon as they understand what the AGI is planning to do.
Therefore it will correctly reason that its goal of making more steel will be easier to achieve if humans are unable to shut it off. This can lead to all kinds of unwanted actions such as the AGI making and hiding copies of itself everywhere, very persuasively convincing humans that it is not going to make as much steel as it secretly plans to so that they don’t try to shut it off, and so on all the way up to killing all humans.
Now, “make as much steel as possible” is an exceptionally stupid goal to give an AGI, and no one would likely do that. But every less stupid goal that has been proposed has had plausible flaws pointed out which generally lead either to extinction or some form of permanent limitation of human potential.
One worry is that “maximize profit” means, simplifying, “maximize number in specific memory location” and you don’t need humans for that. Then, if you expect to be destroyed, you gather power until you don’t. And at some power level destroying humans, though still expensive, becomes less expensive than possibility of them launching another AI and messing with you profit.
Toby Ord’s definition of an existential catastrophe is “anything that destroys humanity’s longterm potential.” The worry is that misaligned AGI which vastly exceeds humanity’s power would be basically in control of what happens with humans, just as humans are, currently, basically in control of what happens with chimpanzees. It doesn’t need to kill all of us in order for this to be a very, very bad outcome.
E.g. the enslavement by the steel-loving AGI you describe sounds like an existential catastrophe, if that AGI is sufficiently superhuman. You describe a “large portion of humanity” enslaved in this scenario, implying a small portion remain free — but I don’t think this would happen. Humans with meaningful freedom are a threat to the steel-lover’s goals (e.g. they could build a rival AGI) so it would be instrumentally important to remove that freedom.
The AGI would rather write programs to do the grunt work, than employ humans, as they can be more reliable, controllable, etc. It could create such agents by looking into its own source code and copying / modifying it. If it doesn’t have this capability it will spend time researching (could be years) until it does. On a thousand-year timescale it isn’t clear why an AGI would need us for anything besides say, specimens for experiments.
Also as reallyeli says, having a single misaligned agent with absolute control of our future seems terrible no matter what the agent does.
Hi, I’m the user who asked this question. Thank you for responding!
I see your point about how an AGI would intentionally destroy humanity versus engineered bugs that only wipe us out “by accident”, but that’s conditional on the AGI having “destroy humanity” as a subgoal. Most likely, a typical AGI will have some mundane, neutral-to-benevolent goal like “maximize profit by running this steel factory and selling steel”. Maybe the AGI can achieve that by taking over an iron mine somewhere, or taking over a country (or the world) and enslaving its citizens, or even wiping out humanity. In general, my guess is that the AGI will try to do the least costly/risky thing needed to achieve its goal (maximizing profit), and (setting aside that if all of humanity were extinct, the AGI would have no one to sell steel to) wiping out humanity is the most expensive of these options and the AGI would likely get itself destroyed while trying to do that. So I think that “enslave a large portion of humanity and export cheap steel at a hefty profit” is a subgoal that this AGI would likely have, but destroying humanity is not.
It depends on the use case—a misaligned AGI in charge of the U.S. Armed Forces could end up starting a nuclear war—but given how careful the U.S. government has been about avoiding nuclear war, I think they’d insist on an AGI being very aligned with their interests before putting it in charge of something so high stakes.
Also, I suspect that some militaries (like North Korea’s) might be developing bioweapons and spending 1 to 100% as much on it annually as OpenAI and DeepMind spend on AGI; we just don’t know about it.
Based on your AGI-bioweapon analogy, I suspect that AGI is a greater hazard than bioweapons, but not by quite as much as your argument implies. While few well-resourced actors are interested in using bioweapons, a who’s who of corporations, states, and NGOs will be interested in using AGI. And AGIs can adopt dangerous subgoals for a wide range of goals (especially resource extraction), whereas bioweapons can basically only kill large groups of people.
It would be pretty easy and cheap for something much smarter than a human to kill all humans. The classic scenario is:
If the smarter-than-human system doesn’t initially have Internet access, it will probably be able to get such access either by manipulating humans, or by exploiting the physical world in unanticipated ways (cf. Bird and Layzell 2002).
But also, if enough people have AGI systems it’s not as though no one will ever hook it up to the Internet, any more than you could give a nuke to every human on Earth and expect no one to ever use theirs.
Eliezer gives one example of a way to kill humanity with nanotech in his conversation with Jaan Tallinn:
Are you assuming that early AGI systems won’t be much smarter than a human?
I don’t think that goal is “neutral-to-benevolent”, but I also don’t think any early AGI systems will have goals remotely like that. Two reasons for that:
We have no idea how to align AI so that it reliably pursues any intended goal in the physical world; and we aren’t on track to figuring that out before AGI is here. “Maximize profit by running this steel factory and selling steel” might be a goal the human operators have for the system; but the actual goal the system ends up optimizing will be something very different, “whatever goal (and overall model) happened to perform well in training, after a blind gradient-descent-ish search for goals (and overall models)”.
If you can reliably instill an ultimate goal like “maximize profit by running this steel factory and selling steel” into an AGI system, the you’ve already mostly solved the alignment problem and eliminated most of the risk.
A more minor objection to this visualization: By default, I expect AGI to vastly exceed human intelligence and destroy the world long before it’s being deployed in commercial applications. Instead, I’d expect early-stage research AI to destroy the world.
There have been a lot of words written about how and why almost any conceivable goal, even a mundane one like “improve efficiency of a steel plant”, carelessly specified, can easily result in a hostile AGI. The basic outline of these arguments usually goes something like:
The AGI wants to do what you told it (“make more steel”), and will optimize very hard for making as much steel as possible.
It also understands human motivations and knows that humans don’t actually want as much steel as it is going to make. But note carefully that it wasn’t aligned to respect human motivations, it was aligned to make steel. It’s understanding of human motivations is part of its understanding of its environment, in the same way as its understanding of metallurgy. It has no interest in doing what humans would want it to do because it hasn’t been designed to do that.
Because it knows that humans don’t want as much steel as it is going to make, it will correctly conclude that humans will try to shut it off as soon as they understand what the AGI is planning to do.
Therefore it will correctly reason that its goal of making more steel will be easier to achieve if humans are unable to shut it off. This can lead to all kinds of unwanted actions such as the AGI making and hiding copies of itself everywhere, very persuasively convincing humans that it is not going to make as much steel as it secretly plans to so that they don’t try to shut it off, and so on all the way up to killing all humans.
Now, “make as much steel as possible” is an exceptionally stupid goal to give an AGI, and no one would likely do that. But every less stupid goal that has been proposed has had plausible flaws pointed out which generally lead either to extinction or some form of permanent limitation of human potential.
One worry is that “maximize profit” means, simplifying, “maximize number in specific memory location” and you don’t need humans for that. Then, if you expect to be destroyed, you gather power until you don’t. And at some power level destroying humans, though still expensive, becomes less expensive than possibility of them launching another AI and messing with you profit.
Reply by reallyeli on the EA Forum:
Reply by acylhalide on the EA Forum: