The comment of mine that you linked to was claiming more than 50% chance (perhaps much more) that any one particular debate training run (perhaps different runs would differ) would yield an AGI that wanted doom (but perhaps would not successfully get it), conditional on model-based RL AGI trained by debate. So that’s different from what you wrote along multiple dimensions.
All-things-considered P(doom) isn’t something I’ve thought about enough to have a strong opinion on. I guess if I had to pick a number it would be 90%, but a lot of that is flowing through things that I’m not super well informed on. (E.g. offense-defense balance.)
The comment of mine that you linked to was claiming more than 50% chance (perhaps much more) that any one particular debate training run (perhaps different runs would differ) would yield an AGI that wanted doom (but perhaps would not successfully get it), conditional on model-based RL AGI trained by debate. So that’s different from what you wrote along multiple dimensions.
All-things-considered P(doom) isn’t something I’ve thought about enough to have a strong opinion on. I guess if I had to pick a number it would be 90%, but a lot of that is flowing through things that I’m not super well informed on. (E.g. offense-defense balance.)
Nice, it’s updated in the post now. The 50% was used as a lower bound but I see the discrepancy in representation.