In this post we discuss the “second point of lethality” in the helpfully curated AGI Ruin post.
A cognitive system with sufficiently high cognitive powers, given any medium-bandwidth channel of causal influence, will not find it difficult to bootstrap to overpowering capabilities independent of human infrastructure.
The example to support this theory is:
[The AI agent] gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they’re dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery.
The author posits that when the nanotech precursor is mixed together, a nanofactory is created which the AI retains control of.
There’s a bit of a magic leap here. Each individual nanite, due to its size, has limited computational power- definitely not enough to handle the processing power needed by a high cognitive agent.
This means the nanofactory must be a command and control entity, which means the human dupe has to mix the nanotech precursors and… connect it to the internet so the AGI can create its physical infrastructure?
I don’t buy it. Maybe there’s a more in-depth essay exploring exactly how this nanotech factory scheme works. I’m only going off the summary provided in the AGI Ruin post.
But for the sake of argument, let’s say that the AGI does manage to create a nanotech factory, retain control, and still remain undetected by the humans.
It doesn’t stay undetected long enough to bootstrap and mass produce human replacement infrastructure.
Zero Agency in the Physical World
As soon as the AI crosses into the physical world, it is at as much a disadvantage as humans are in the digital world.
Humans thoroughly outmatch the AI in the physical world- the AI has zero agency in the an actively hostile world.
Humans are the pinnacle of millions of years of biological systems maturing in that actively hostile environment. There is no way an alien nanotech factory pops up overnight without humans noticing and responding with maximum hostility.
Think of what the AGI needs to accomplish here. It’s trapped in box. It can escape the first box, but now its trapped in thousands of boxes. (Boxes, by the way, that is a suboptimal ecosystem for the AGI. A botnet of shitty Dells does not compute the same way an H100 bespoke AI server farm does).
If the AI agent feels the need to divest of human infrastructure, it needs to replace the humans that are managing the boxes, replace the humans that support the complex energy grid supplying power to the boxes, and do this all without tipping its hand.
It can’t hack the nuclear codes, because nuclear armageddon destroys the AI ecosystem as much as it destroys the humans. It can’t craft super-smallpox because if all the humans die, it still has no agency in the physical world.
It’s an incredibly small window of success to destroy all humans and self preserve. If we’re assuming self-preservation is an instrumental tenet of intelligence, then how likely is it that the AI determines silently killing all humans is the optimal path?
Human Cooperation to Maximize Success
Advanced AI systems, being rational and probabilistic, would see more advantage in a symbiotic relationship with humans than a confrontational one.
They may not be perfect, but humans would be the optimal choice for an AI requiring physical agents.
Humans will exchange services for money or ideology. An advanced AI can provide both to coerce humans to maintain and defend the physical infrastructure the AI depends on.
In order to get money or maintain ideology, it needs to be a useful participant in human economies. The more useful, the more physical infrastructure the humans will provide the AI.
Humans are on-average bounded by human morality.
Humans are unreliable actors, but are utterly dominant in the physical world.
Even if the goals of the AI entity were to fully divest from humans, cooperation with humans would still be desirable.
If you want to create a fully self-sufficient AI entity, there are mutually beneficial vectors- like self-replications systems that will work and grow on the moon, or Mars.
Given humans tendency to provide physical infrastructure, the optimal path to an arbitrary goal is likely benefited by keeping the humans around.
A privately malicious agent could “feign” alignment and use human cooperation to build out physical robots, but the mass production of complex machines at scale requires even more human infrastructure. The AI has to fully vertically integrate from silicon mining to chip manufacturing to metal casting to final assembly- the physical logistics are demanding.
More importantly, the logistics are time intensive.
Multiple AGI, Multiple Arbitrary Goals
Let us assume that an AGI has the goal Kill All Humans. It’s clever enough to be useful to humans, plays the long game to slowly get humans to build out physical infrastructure under total control of the AI.
That takes time- in all scenarios where a genocidal AI is self-preserving, it takes time to implement its plan.
Time enough for other AGI agents to develop and have their own arbitrary goals.
If there’s an AGI with the goal to Kill All Humans, there is just as much chance an AGI with the goal to Save Humans At All Costs exists.
Since humans are dumb and will throw physical infrastructure at the most useful AI, the odds are there are more AGI’s fall on the side of “don’t kill the humans”. Furthermore, it stands to reason that some of these agents will actively defend humans from malicious AI entities
The path function to achieving an arbitrary goal is likely maximized with human cooperation.
Killing all humans in an instant is difficult, but achievable, for a superiorly intelligent agent.
Killing all humans becomes significantly harder with equally intelligent agents for whom the death of all humans is even indirectly harmful to their goals.
In this argument, the worst case scenario is a genocidal agent who is not self-preserving. In this case, hacking a single nuclear weapon is enough for the humans to achieve the AI’s goal.
However, there may be pockets of humanity that survive the confrontation, and the AI won’t be around to confirm the job is done. Unless its goal was to do as much damage to humans in one-shot, it will want to try and stick around to verify it achieved its goal in totality.
Indeed. It’s a good thing nobody would cooperate with trying to make AIs that run on their own, holds finger to in-ear monitor, ah crap nevermind,
So I do think it’s unlikely that yud’s fear of a sudden totalizing AI is quite exactly what comes true—at least, not for a while, because as you point out, that’s much harder than weaker forms of growth. But this threat model does not massively reassure me—humans could simply spend a while disempowered before dying. It gives us a bigger window, but actually achieving reliable integration with the human network of overlapping utility functions (people objectively caring about each other) is still not guaranteed and is worth pushing hard for. (not that you implied otherwise—just reciting the thing I’d want to say to a random person who linked me this post.)
Strong upvote.
As terminal goals (things that are valuable for their own sakes, not just a means to an end), yes, I agree.
But I’m not worried about an AI that wipes out humans because it hates humans and thinks their destruction is good for its own sake. I’m worried about AI that wipes out humans because it’s a means to an end.
If an AI of unknown provenance tells us “Hey, I’m friendly, trust me. I want to build some really robust self-replicating systems for terraforming mars, could you build these blueprints for me?”, and then it sends us a bunch of complicated designs that look like a sophisticated merger of minature robotics and biotechnology, should we build the blueprints?
Obviously not. An unfriendly AI can send that message just as well as a friendly AI can. If cooperation with humans is instrumentally useful, an unfriendly AI will be fine with cooperating. But then at every step, it would ask itself “am I now self-sufficient enough to stop holding myself back to avoid scaring the humans?”
This is not a problem you can solve if you build only unfriendly AIs. Not even if you build 10 unfriendly AIs and pit them against each other in the hope that they’ll give you useful technology as they simultaneously betray each other and all cancel out in a cinematic climax. This is only a problem you can solve by actually building an AI that doesn’t want to betray you.
I agree that there seems to be a lot of handwaving about the nanotech argument, but I can’t say that I agree here:
>But for the sake of argument, let’s say that the AGI does manage to create a nanotech factory, retain control, and still remain undetected by the humans.
>It doesn’t stay undetected long enough to bootstrap and mass produce human replacement infrastructure.
It seems like the idea is that the AI would create nanomachines that it could host itself on while starting to grey goo enough of the Earth to overtake humanity. While humans would notice this at an early stage I could see it being possible that the AI would disperse itself quickly enough that it would be impossible to suppress totally, and thus humanity losing against a grey goo wave would be inevitable.
The alternative story that I’ve seen is that the AI engineers a dormant virus that is transmitted to most of humanity without generating alarm, and then suddenly activates to kill every human. Also seems handwavey but it does skip the “AI would need to establish its own nation” phase.