The case for stopping AI safety research

TLDR: AI systems are failing in obvious and manageable ways for now. Fixing them will push the failure modes beyond our ability to understand and anticipate, let alone fix. The AI safety community is also doing a huge economic service to developers. Our belief that our minds can “fix” a super-intelligence—especially bit by bit—needs to be re-thought.

I wanted to write this post forever, but now seems like a good time. The case is simple, I hope it takes you 1min to read.

  1. AI safety research is still solving easy problems. We are patching up the most obvious (to us) problems. As time goes we will no longer be able to play this existential risk game of chess with AI systems. I’ve argued this a lot (ICML 2024 spotlight paper; also www.agencyfoundations.ai). Seems others have this thought.

  2. Capability development is getting AI safety research for free. It’s likely in the millions to tens of millions of dollars. All the “hackathons”, and “mini” prizes to patch something up or propose a new way for society to digest/​adjust to some new normal (and increasingly incentivizing existing academic labs).

  3. AI safety research is speeding up capabilities. I hope this is somewhat obvious to most.

I write this now because in my view we are about 5-7 years before massive human biometric and neural datasets will enter our AI training. These will likely generate amazing breakthroughs in long-term planning and emotional and social understanding of the human world. They will also most likely increase x-risk radically.

Stopping AI safety research or taking it in-house with security guarantees etc, will slow down capabilities somewhat—and may expose capabilities developers more directly to public opinion of still manageable harmful outcomes.