ryan_greenblatt comments on Joshua Achiam Public Statement Analysis

ryan_greenblatt 10 Oct 2024 17:47 UTC
18 points
4
In case Joshua Achiam ends up reading this post, my question for him is:

My understanding is that you think P(misaligned AGI will kill all humans by 2032) is extremely low, like 1e-6.

Is this because:
- We won’t have AGI by 2032?
- Much-smarter-than-human AIs could takeover, but not AGI and much-smarter-than-human AIs won’t exist prior to 2032?
- Misalignment could cause problems, but not extinction or literal AI takeover? (E.g., because misalignment won’t be this bad.)
- AI takeover is plausible but wouldn’t kill every person?
What is your P(violent misaligned AI takeover | AGI by 2032)?^[1]

Some clarification:

I’m using AGI to mean top-human-expert level AI. (As in, can obsolete top human experts in most non-physical tasks.) OpenAI typically uses a similar definition.

By “misalignment”, I mean “AIs that conspire against you and your countermeasures”, not “AIs which aren’t sufficiently robust to jailbreaks”.

My guess is that your view is that “misalignment” could be bad, but not existential, while other risks from AGI (e.g. new superweapons used in great power conflict) could be existential.

My view is that a misaligned AI that succeeds in takeover probably wouldn’t kill literally every person, though takeover has a high probability of killing billions, killing everyone is plausible, and this would eliminate human control of the future which is likely extremely bad.

So, I prefer to talk about the chances of “violent misaligned AI takeover” where violent means “a bunch of people die or are harmed”.

(This is based on this twitter thread I wrote.)
1. ↩︎
  Note that this includes takeover by later systems which are more powerful than AGI. I’m just fixing AGI being created before some date..