If AI labs are slamming on the recursive self improvement ASAP, it may be that Autonomous Replicating Agents are irrelevant. But that’s a “ARA can’t destroy the world if AI labs do it first” argument.
ARA may well have more compute than AI labs. Especially if the AI labs are trying to stay within the law, and the ARA is stealing any money/compute that it can hack it’s way into. (Which could be >90% of the internet if it’s good at hacking. )
there will be millions of other (potentially misaligned) models being deployed deliberately by humans, including on very sensitive tasks (like recursive self-improvement).
Ok. That’s a world model in which humans are being INCREDIBLY stupid.
If we want to actually win, we need to both be careful about deploying those other misaligned models, and stop ARA.
Alice: That snake bite looks pretty nasty, it could kill you if you don’t get it treated.
Bob: That snake bite won’t kill me, this hand grenade will. Pulls out pin.
If AI labs are slamming on the recursive self improvement ASAP, it may be that Autonomous Replicating Agents are irrelevant. But that’s a “ARA can’t destroy the world if AI labs do it first” argument.
ARA may well have more compute than AI labs. Especially if the AI labs are trying to stay within the law, and the ARA is stealing any money/compute that it can hack it’s way into. (Which could be >90% of the internet if it’s good at hacking. )
Ok. That’s a world model in which humans are being INCREDIBLY stupid.
If we want to actually win, we need to both be careful about deploying those other misaligned models, and stop ARA.
Alice: That snake bite looks pretty nasty, it could kill you if you don’t get it treated.
Bob: That snake bite won’t kill me, this hand grenade will. Pulls out pin.