[Question] We might be dropping the ball on Autonomous Replication and Adaptation.

Here is a little Q&A

Can you explain your position quickly?

I think autonomous replication and adaptation in the wild is under-discussed as an AI threat model. And this makes me sad because this is one of the main reasons I’m worried. I think one of AI Safety people’s main proposals should first focus on creating a nonproliferation treaty. Without this treaty, I think we are screwed. The more I think about it, the more I think we are approaching a point of no return. It seems to me that open source is a severe threat and that nobody is really on the ball. Before those powerful AIs can self-replicate and adapt, AI development will be very positive overall and difficult to stop, but it’s too late after AI is able to adapt and evolve autonomously because Natural selection favors AI over humans.

What is ARA?

Autonomous Replication and Adaptation. Let’s recap this quickly. Today, generative AI functions as a tool: you ask a question and the tool answers. Question, answer. It’s simple. However, we are heading towards a new era of AI, one with autonomous AI. Instead of asking a question, you give it a goal, and the AI performs a series of actions to achieve that goal, which is much more powerful. Libraries like AutoGPT or ChatGPT, when they navigate the internet, already show what these agents might look like.

Agency is much more powerful and dangerous than AI tools. Thus conceived, AI would be able to replicate autonomously, copying itself from one computer to another, like a particularly intelligent virus. To replicate on a new computer, it must navigate the internet, create a new account on AWS, pay for the virtual machine, install the new weights on this machine, and start the replication process.

According to METR, the organization that audited OpenAI, a dozen tasks indicate ARA capabilities. GPT-4 plus basic scaffolding was capable of performing a few of these tasks, though not robustly. This was over a year ago, with primitive scaffolding, no dedicated training for agency, and no reinforcement learning. Multimodal AIs can now successfully pass CAPTCHAs. ARA is probably coming.

It could be very sudden. One of the main variables for self-replication is whether the AI can pay for cloud GPUs. Let’s say a GPU costs $1 per hour. The question is whether the AI can generate $1 per hour autonomously continuously. Then, you have something like an exponential process. I think that the number of AIs is probably going to plateau, but regardless of a plateau and the number of AIs you get asymptotically, here you are: this is an autonomous AI, which may become like an endemic virus that is hard to shut down.

Is ARA a point of no return?

Yes, I think ARA with full adaptation in the wild is beyond the point of no return.

Once there is an open-source ARA model or a leak of a model capable of generating enough money for its survival and reproduction and able to adapt to avoid detection and shutdown, it will be probably too late:

- The idea of making an ARA bot is very accessible.

- The seed model would already be torrented and undeletable.

- Stop the internet? The entire world’s logistics depend on the internet. In practice, this would mean starving the cities over time.

- Even if you manage to stop the internet, once the ARA bot is running, it will be unkillable. Even rebooting all providers like AWS would not suffice, as individuals could download and relaunch the model, or the agent could hibernate on local computers. The cost to completely eradicate it altogether would be way too high, and it only needs to persist in one place to spread again.

The question is more interesting for ARA with incomplete adaptation capabilities. It is likely that early versions of ARA are just going to be very dumb and could be stopped if they disrupt too much society, but we are very uncertain about how strongly society would answer to it and if it would be more competent than dealing with Covid blah blah.


Figure from What convincing warning shot could help prevent extinction from AI?

No return towards what?

In the Short term:

Even if AI capable of ARA does not lead to extinction in the short term, and even if it plateaus, we think this can already be considered a virus with many bad consequences.

But we think it’s pretty likely that good AIs will be created at the same time, in continuity with what we see today: AI can be used to accelerate both good and bad things. I call this the “Superposition hypothesis”: Everything happens simultaneously.

Good stuff includes being able to accelerate research and the economy. Many people might appreciate ARA-capable AIs for their efficiency and usefulness as super assistants, similar to how people today become addicted to language models, etc.

Overall, it’s pretty likely that before full adaptation, AI capable of AR would overall be pretty positive, and as a result, people would continue racing ahead.

In the long term:

If AI reaches ARA with full adaptation, including the ability to hide successfully (eg, fine-tune a bit itself to hide from sha256) and resist shutdown, I feel this will trigger an irreversible process and a gradual loss of control (p=60%).

Once an agent sticks around in a way we can’t kill, we should expect selection pressure to push it toward a full takeover eventually, in addition to any harm it may do during this process.

Selection pressure and competition would select for capabilities; Adaptation allows for resistance and becoming stronger.

Selection pressure and competition would also create undesirable behavior. These AIs will be selected for self-preserving behaviors. For example, the AI could play dead to avoid elimination, like in this simulation of evolution (section: play dead). Modification is scary not only because the model gains more ability but also because it is very likely that the goal becomes selected more and more. The idea that goals themselves are selected for is essential.

At the end of the day, Natural selection favors AIs over humans. This means that once AIs are autonomous enough to be considered a species, they compete with humans for resources like energy and space… In the long run, if we don’t react, we lose control.

Do you really expect to die because of ARA AI?

No, not necessarily right away. Not with the first AIs capable of ARA. But the next generation seems terrifying.

Loss of control arrives way before death.

We need to react very quickly.

Got it. How long do we have?

This is uncertain. It might be 6 months or 5 years. Idk.

Open Source is a bit behind on compute but not that behind on techniques. If the bottleneck is data or technique rather than compute, we are fucked.

Why do you think we are dropping the ball on ARA?

Even if ARA is evaluated by big labs, we still need to do something about open-source AI development, and this is not really in the Overton window.

The situation is pretty lame. The people in charge still do not know about this threat at all, and most people I encounter do not know about it. In France, we only hear, “We need to innovate.”

What can we do about it now?

I think the priority is to increase the amount of communication/​discussion on this. If you want a template, you can read the op-ed we published with Yoshua Bengio: “It is urgent to define red lines that should not be crossed regarding the creation of systems capable of replicating in a completely autonomous manner.”

My main uncertainty is: are we going to see convincing warning shots before the point of no return?


I intend to do more research on this, but I already wanted to share this.

Thanks to Fabien Roger, Alexander Variengien, Diego Dorn and Florent Berthet.

Work done while at the CeSIA—The Centre pour la Sécurité de l’IA, in Paris.