I’d love to see the most important types of work for each failure mode. Here’s my very quick version, any disagreements or additions are welcome:
Appreciate you doing a quick version. I’m excited for more attempts at this and would like to write something similar myself, though I might structure it the other way round if I do a high effort version (take an agenda, work out how/if it maps onto the different parts of this). Will try to do a low-effort set of quick responses to yours soon.
P(Doom) for each scenario would also be useful.
Also in the (very long) pipeline, and a key motivation! Not just for each scenario in isolation, but also for various conditionals like: - P(scenario B leads to doom | scenario A turns out not to be an issue by default) - P(scenario B leads to doom | scenario A turns out to be an issue that we then fully solve) - P(meaningful AI-powered alignment progress is possible before doom | scenario C is solved)
Thanks, both for the thoughts and encouragement!
Appreciate you doing a quick version. I’m excited for more attempts at this and would like to write something similar myself, though I might structure it the other way round if I do a high effort version (take an agenda, work out how/if it maps onto the different parts of this). Will try to do a low-effort set of quick responses to yours soon.
Also in the (very long) pipeline, and a key motivation! Not just for each scenario in isolation, but also for various conditionals like:
- P(scenario B leads to doom | scenario A turns out not to be an issue by default)
- P(scenario B leads to doom | scenario A turns out to be an issue that we then fully solve)
- P(meaningful AI-powered alignment progress is possible before doom | scenario C is solved)
etc.