And I can see why convincing anyone this late has less benefit, but maybe it’s still worth it?
I don’t know what you mean by “worth it”. I’m not planning to make a warning shot happen, and would strongly advise that others not do so either. :p
A very late-stage warning shot might help a little, but the whole scenario seems unimportant and ‘not where the action is’ to me. The action is in slow, unsexy earlier work to figure out how to actually align AGI systems (or failing that, how to achieve some non-AGI world-saving technology). Fantasizing about super-unlikely scenarios that wouldn’t even help strikes me as a distraction from figuring out how to make alignment progress today.
I’m much more excited by scenarios like: ‘a new podcast comes out that has top-tier-excellent discussion of AI alignment stuff, it becomes super popular among ML researchers, and the culture norms, and expectations of ML thereby shift such that water-cooler conversations about AGI catastrophe are more serious, substantive, informed, candid, and frequent’.
It’s rare for a big positive cultural shift like that to happen; but it does happen sometimes, and it can result in very fast changes to the Overton window. And since it’s a podcast containing many hours of content, there’s the potential to seed subsequent conversations with a lot of high-quality background thoughts.
By comparison, I’m less excited about individual researchers who explicitly say the words ‘I’ll only work on AGI risk after a catastrophe has happened’. This is a really foolish view to consciously hold, and makes me a lot less optimistic about the relevance and quality of research that will end up produced in the unlikely event that (a) a catastrophe actually happens, and (b) they actually follow through and drop everything they’re working on to do alignment.
I don’t know what you mean by “worth it”. I’m not planning to make a warning shot happen, and would strongly advise that others not do so either. :p
A very late-stage warning shot might help a little, but the whole scenario seems unimportant and ‘not where the action is’ to me. The action is in slow, unsexy earlier work to figure out how to actually align AGI systems (or failing that, how to achieve some non-AGI world-saving technology). Fantasizing about super-unlikely scenarios that wouldn’t even help strikes me as a distraction from figuring out how to make alignment progress today.
I’m much more excited by scenarios like: ‘a new podcast comes out that has top-tier-excellent discussion of AI alignment stuff, it becomes super popular among ML researchers, and the culture norms, and expectations of ML thereby shift such that water-cooler conversations about AGI catastrophe are more serious, substantive, informed, candid, and frequent’.
It’s rare for a big positive cultural shift like that to happen; but it does happen sometimes, and it can result in very fast changes to the Overton window. And since it’s a podcast containing many hours of content, there’s the potential to seed subsequent conversations with a lot of high-quality background thoughts.
By comparison, I’m less excited about individual researchers who explicitly say the words ‘I’ll only work on AGI risk after a catastrophe has happened’. This is a really foolish view to consciously hold, and makes me a lot less optimistic about the relevance and quality of research that will end up produced in the unlikely event that (a) a catastrophe actually happens, and (b) they actually follow through and drop everything they’re working on to do alignment.