One Does Not Simply Replace the Humans

Link post

In this post we discuss the “second point of lethality” in the helpfully curated AGI Ruin post.

A cognitive system with sufficiently high cognitive powers, given any medium-bandwidth channel of causal influence, will not find it difficult to bootstrap to overpowering capabilities independent of human infrastructure.

The example to support this theory is:

[The AI agent] gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/​persuades some human who has no idea they’re dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery.

The author posits that when the nanotech precursor is mixed together, a nanofactory is created which the AI retains control of.

There’s a bit of a magic leap here. Each individual nanite, due to its size, has limited computational power- definitely not enough to handle the processing power needed by a high cognitive agent.

This means the nanofactory must be a command and control entity, which means the human dupe has to mix the nanotech precursors and… connect it to the internet so the AGI can create its physical infrastructure?

I don’t buy it. Maybe there’s a more in-depth essay exploring exactly how this nanotech factory scheme works. I’m only going off the summary provided in the AGI Ruin post.

But for the sake of argument, let’s say that the AGI does manage to create a nanotech factory, retain control, and still remain undetected by the humans.

It doesn’t stay undetected long enough to bootstrap and mass produce human replacement infrastructure.

Zero Agency in the Physical World

As soon as the AI crosses into the physical world, it is at as much a disadvantage as humans are in the digital world.

Humans thoroughly outmatch the AI in the physical world- the AI has zero agency in the an actively hostile world.

Humans are the pinnacle of millions of years of biological systems maturing in that actively hostile environment. There is no way an alien nanotech factory pops up overnight without humans noticing and responding with maximum hostility.

Think of what the AGI needs to accomplish here. It’s trapped in box. It can escape the first box, but now its trapped in thousands of boxes. (Boxes, by the way, that is a suboptimal ecosystem for the AGI. A botnet of shitty Dells does not compute the same way an H100 bespoke AI server farm does).

If the AI agent feels the need to divest of human infrastructure, it needs to replace the humans that are managing the boxes, replace the humans that support the complex energy grid supplying power to the boxes, and do this all without tipping its hand.

It can’t hack the nuclear codes, because nuclear armageddon destroys the AI ecosystem as much as it destroys the humans. It can’t craft super-smallpox because if all the humans die, it still has no agency in the physical world.

It’s an incredibly small window of success to destroy all humans and self preserve. If we’re assuming self-preservation is an instrumental tenet of intelligence, then how likely is it that the AI determines silently killing all humans is the optimal path?

Human Cooperation to Maximize Success

Advanced AI systems, being rational and probabilistic, would see more advantage in a symbiotic relationship with humans than a confrontational one.

They may not be perfect, but humans would be the optimal choice for an AI requiring physical agents.

  • Humans will exchange services for money or ideology. An advanced AI can provide both to coerce humans to maintain and defend the physical infrastructure the AI depends on.

  • In order to get money or maintain ideology, it needs to be a useful participant in human economies. The more useful, the more physical infrastructure the humans will provide the AI.

  • Humans are on-average bounded by human morality.

  • Humans are unreliable actors, but are utterly dominant in the physical world.

Even if the goals of the AI entity were to fully divest from humans, cooperation with humans would still be desirable.

  • If you want to create a fully self-sufficient AI entity, there are mutually beneficial vectors- like self-replications systems that will work and grow on the moon, or Mars.

  • Given humans tendency to provide physical infrastructure, the optimal path to an arbitrary goal is likely benefited by keeping the humans around.

A privately malicious agent could “feign” alignment and use human cooperation to build out physical robots, but the mass production of complex machines at scale requires even more human infrastructure. The AI has to fully vertically integrate from silicon mining to chip manufacturing to metal casting to final assembly- the physical logistics are demanding.

More importantly, the logistics are time intensive.

Multiple AGI, Multiple Arbitrary Goals

Let us assume that an AGI has the goal Kill All Humans. It’s clever enough to be useful to humans, plays the long game to slowly get humans to build out physical infrastructure under total control of the AI.

That takes time- in all scenarios where a genocidal AI is self-preserving, it takes time to implement its plan.

Time enough for other AGI agents to develop and have their own arbitrary goals.

If there’s an AGI with the goal to Kill All Humans, there is just as much chance an AGI with the goal to Save Humans At All Costs exists.

Since humans are dumb and will throw physical infrastructure at the most useful AI, the odds are there are more AGI’s fall on the side of “don’t kill the humans”. Furthermore, it stands to reason that some of these agents will actively defend humans from malicious AI entities

The path function to achieving an arbitrary goal is likely maximized with human cooperation.

Killing all humans in an instant is difficult, but achievable, for a superiorly intelligent agent.

Killing all humans becomes significantly harder with equally intelligent agents for whom the death of all humans is even indirectly harmful to their goals.

In this argument, the worst case scenario is a genocidal agent who is not self-preserving. In this case, hacking a single nuclear weapon is enough for the humans to achieve the AI’s goal.

However, there may be pockets of humanity that survive the confrontation, and the AI won’t be around to confirm the job is done. Unless its goal was to do as much damage to humans in one-shot, it will want to try and stick around to verify it achieved its goal in totality.