Seth Herd comments on To the average human, controlled AI is just as lethal as ‘misaligned’ AI

Seth Herd 14 Mar 2024 21:44 UTC
4 points
0
I think you raise an important point. If we solve alignment, do we still all die?

This has been discussed in the alignment community under the terminology of a “pivotal act”. It’s often been assumed that an aligned AGI would prevent the creation of more AGIs to prevent both the accidental creation of misaligned AGIs, and the deliberate creation of AGIs that are misaligned to most of humanity’s interests, while being aligned to the creator’s goals. Your misuse category falls into the latter. So you should search for posts under the term pivotal act. I don’t know of any particularly central ones off the top of my head.

However, I think this is worth more discussion. People have started to talk about “multipolar scenarios” in which we have multiple or many human-plus level AGIs. I’m unclear on how people think we’ll survive such a scenario, except by not thinking about it a lot. I think this is linked to the shift in predicting a slower takeoff, where AGI doesn’t become superintelligent that quickly. But I think the same logic applies, even if we survive for a few years longer.

I hope to be convinced otherwise, but I currently mostly agree with your logic for multipolar scenarios. I think we’re probably doomed to die if that’s allowed to happen. See What does it take to defend the world against out-of-control AGIs? for reasons that a single AGI could probably end the world even if friendly AGIs have a headstart in trying to defend it.

I’d summarize my concerns thus: Self-improvement creates an unstable situation to which no game-theoretic cooperative equilibrium applies. It’s like playing Diplomacy where the players can change the rules arbitrarily on each turn. If there are many AGIs under human control, one will eventually have goals for the use of Earth at odds with those of humanity at large. This could happen because of an error in its alignment, or because the human(s) controlling it has non-standard beliefs or values.

When this happens, I think it’s fairly easy for a self-improving AGI to destroy human civilization (although perhaps not other AGIs with good backup plans). It just needs to put together a hidden (perhaps off-planet, underground or underwater) robotic production facility that can produce new compute and new robots. That’s if there’s nothing simpler and more clever to do, like diverting an asteroid or inventing a way to produce a black hole. The plans get easier the less you care about using the Earth immediately afterward.

I agree that this merits more consideration.

I also agree that the title should change. LW very much looks down on clickbait titles. I don’t think you intended to argue that AI won’t kill people, merely that people with AIs will. I believe you can edit the title, and you should.

Edit: I recognized the title and didn’t take you to be arguing against autonomous AI as a risk—but it does actually make that claim, so probably best to change it.
- YonatanK 15 Mar 2024 3:33 UTC
  1 point
  0
  Parent
  You have a later response to some clarifying comments from me, so this may be moot, but I want to call out that my emphasis is on the behavior of human agents who are empowered by automation that may fall well short of AGI. A “pivotal act” is a very germane idea, but rather than the pivotal act of the first AGI eliminating would-be AGI competitors, this act is carried out by humans taking out their human rivals.
  
  It is pivotal because once the target population size has been achieved, competition ends, and further development of the AI technology can be halted as unnecessarily risky.