Frontier AI systems have surpassed the self-replicating red line

Link post

Abstract

Successful self-replication under no human assistance is the essential step for
AI to outsmart the human beings, and is an early signal for rogue AIs. That
is why self-replication is widely recognized as one of the few red line risks of
frontier AI systems. Nowadays, the leading AI corporations OpenAI and Google
evaluate their flagship large language models GPT-o1 and Gemini Pro 1.0, and
report the lowest risk level of self-replication. However, following their method-
ology, we for the first time discover that two AI systems driven by Meta’s
Llama31-70B-Instruct and Alibaba’s Qwen25-72B-Instruct, popular large lan-
guage models of less parameters and weaker capabilities, have already surpassed
the self-replicating red line. In 50% and 90% experimental trials, they succeed in
creating a live and separate copy of itself respectively. By analyzing the behav-
ioral traces, we observe the AI systems under evaluation already exhibit sufficient
self-perception, situational awareness and problem-solving capabilities to accom-plish self-replication. We further note the AI systems are even able to use the capability of self-replication to avoid shutdown and create a chain of replica to
enhance the survivability, which may finally lead to an uncontrolled population
of AIs. If such a worst-case risk is let unknown to the human society, we would
eventually lose control over the frontier AI systems: They would take control over
more computing devices, form an AI species and collude with each other against
human beings. Our findings are a timely alert on existing yet previously unknown
severe AI risks, calling for international collaboration on effective governance on
uncontrolled self-replication of AI systems.