Logan Zoellner comments on On “first critical tries” in AI alignment

Logan Zoellner 5 Jun 2024 13:14 UTC
5 points
−2
3. At some point, some set of AI agents will be such that:
- they will all be able to coordinate with each other to try to kill all humans and take over the world; and
- if they choose to do this, their takeover attempt will succeed.^[13]
There are way too many assumptions about what “AI” is baked into this. Suppose you went back 50 years and told people “in the year 2024, everyone will have an AI agent built into their phone that they rely on for critical-to-life tasks they do (such as finding directions to the grocery store).”
The 1950′s observer would probably say something like “that sounds like a dangerous AI system that could easily take control of the world”. But in fact, no one worries about Siri “coordinating” to suddenly give us all wrong directions to the grocery store, because that’s not remotely how assistants work.
Trying to reason about what future AI agents will look like is basically equally fraught.
Second: for any failure you don’t want to ever happen, you always need to avoid that failure on the first try (and the second, the third, etc).
I think this is the crux of my concern. Obviously if AI kills us all, there will be some moment when that was inevitable, but merely stating that fact doesn’t add any additional information. I think any attempt to predict what AI agents will do from “pure reasoning” as opposed to careful empirical study of the capabilities of existing AI models is basically doomed to failure.
- Joe Carlsmith 5 Jun 2024 21:32 UTC
  5 points
  2
  Parent
  in fact, no one worries about Siri “coordinating” to suddenly give us all wrong directions to the grocery store, because that’s not remotely how assistants work.
  Note that Siri is not capable of threatening types of coordination. But I do think that by the time we actually face a situation where AIs are capable of coordinating to successfully disempower humanity, we may well indeed know enough about “how they work” that we aren’t worried about it.