Steven Byrnes comments on Safety timelines: How long will it take to solve alignment?

Steven Byrnes 19 Sep 2022 22:26 UTC
6 points
2
The comment of mine that you linked to was claiming more than 50% chance (perhaps much more) that any one particular debate training run (perhaps different runs would differ) would yield an AGI that wanted doom (but perhaps would not successfully get it), conditional on model-based RL AGI trained by debate. So that’s different from what you wrote along multiple dimensions.

All-things-considered P(doom) isn’t something I’ve thought about enough to have a strong opinion on. I guess if I had to pick a number it would be 90%, but a lot of that is flowing through things that I’m not super well informed on. (E.g. offense-defense balance.)
What links here?
- Safety timelines: How long will it take to solve alignment? by Esben Kran (EA Forum; 19 Sep 2022 12:51 UTC; 45 points)
- Safety timelines: How long will it take to solve alignment? by Esben Kran (19 Sep 2022 12:53 UTC; 37 points)
- Esben Kran 20 Sep 2022 8:41 UTC
  1 point
  0
  Parent
  Nice, it’s updated in the post now. The 50% was used as a lower bound but I see the discrepancy in representation.