Zac Hatfield-Dodds comments on Scenario planning for AI x-risk

Zac Hatfield-Dodds 10 Feb 2024 10:21 UTC
5 points
1

By “sustainability,” I mean that a theory of victory should ideally not reduce AI x-risk per year to a constant, low level, but instead continue to reduce AI x-risk over time. In the former case, “expected time to failure”[31] would remain constant, and total risk over a long enough time period would inevitably reach unacceptable levels. (For example, a 1% chance of an existential catastrophe per year implies an approximately 63% chance over 100 years.)

Obviously yes, a 1% pa chance of existential catastrophe is utterly unacceptable! I’m not convinced that “continues to reduce over time” is the right framing though; if we achieved a low enough constant rate for a MTBF of many millions of years I’d expect other projects to have higher long-term EV given the very-probably-finite future resources available anyway. I also expect that the challenge is almost entirely in getting to an acceptably low rate, not in the further downward trend, so it’s really a moot point.

(I’m looking forward to retiring from this kind of thing if or when I feel that AI risk and perhaps synthetic biorisk are under control, and going back to low-stakes software engineering r&d… though not making any active plans)
- Corin Katzke 13 Feb 2024 17:22 UTC
  3 points
  0
  Parent
  Yep, fair enough. I agree that an MTBF of millions of years is an alternative sustainable theory of victory.
  
  Could you expand on “the challenge is almost entirely in getting to an acceptably low rate”? It’s not clear to me that that’s true. For example, it seems plausible that at some point nuclear risk was at an acceptably low rate (maybe post-fall of the USSR? I’m niether an expert nor old enough to remember) conditional on a further downward trend — but we didn’t get a further downward trend.