ryan_greenblatt comments on Adversarial Robustness Could Help Prevent Catastrophic Misuse

ryan_greenblatt 18 Dec 2023 19:35 UTC
LW: 4 AF: 2
2
AF
If causing catastrophes is difficult, this should reduce our concern with both misuse and rogue AIs causing sudden extinction. Other concerns like military arms races, lock-in of authoritarian regimes, or Malthusian outcomes in competitive environments would become relatively more important.

I agree that “causing catastrophes is difficult” should reduce concerns with “rogue AIs causing sudden extinction (or merely killing very large numbers of people like >1 billion)”.

However, I think these sorts of considerations don’t reduce AI takeover or other catastrophe due to rogue AI as much as you might think for a few reasons:
- Escaped rogue AIs might be able to do many obviously bad actions over a long period autonomously. E.g., acquire money, create a cult, use this cult to build a bioweapons lab, and then actually develop bioweapons over long-ish period (e.g., 6 months) using >tens of thousands of queries to the AI. This looks quite different from the misuse threat model which required that omnicidal (or otherwise bad) humans possess the agency to make the right queries to the AI and solve the problems that the AI can’t solve. For instance, humans have to ensure that queries were sufficiently subtle/jailbreaking to avoid detection via various other mechanisms. The rogue AI can train humans over a long period and all the agency/competence can come from the rogue AI. So, even if misuse is unlikely by humans, autonomous rogue AIs making weapons of mass destruction is perhaps more likely.
- Escaped rogue AIs are unlike misuse in that even if we notice a clear and serious problem, we might have less we can do. E.g., the AIs might have already built hidden datacenters we can’t find. Even if they don’t and are just autonomously replicating on the internet, shutting down the internet is extremely costly and only postpones the problem.
- AI takeover can route through mechanisms other than sudden catastrophe/extinction. E.g., allying with rogue states, creating a rogue AI run AI lab which builds even more powerful AI as fast as possible. (I’m generally somewhat skeptical of AIs trying to cause extinction for reasons discussed here, here, and here. Though causing huge amounts of damage (e.g. >1 billion) dead seems somewhat more plausible as a thing rogue AIs would try to do.)
- aogara 18 Dec 2023 19:43 UTC
  4 points
  0
  Parent
  Yep, agreed on the individual points, not trying to offer a comprehensive assessment of the risks here.