I’m not sure about comparing warning shots to adding entropy more generally, cause warning shots are very specific and useful datapoint, not just any random data.
I don’t think warning shots are random, but if they have a large impact it may be in unexpected directions, perhaps for butterfly-effect-y reasons.
[Just for definitions sake, I’m assuming warning shot here is an AI that a) clearly demonstrates deceptive intent and malevolent-to-human intent, and b) either has enough capabilities to do some damage, or enough capability that one can much more easily extrapolate why increasing capability will do damage, and this extrapolation can be done even if you don’t have a really good world model.]
I’m not defining warning shots that way; I’d be much more surprised to see an event like that happen (because it’s more conjunctive), and I’d be much more confident that a warning shot like that won’t shift us from a very-bad trajectory to an OK one (because I’d expect an event like that to come very shortly before AGI destroys or saves the world, if an event like that happened at all).
When I say ‘warning shot’ I just mean ‘an event where AI is perceived to have caused a very large amount of destruction’.
The warning shots I’d expect to do the most good are ones like:
’20 or 30 or 40 years before we’d naturally reach AGI, a huge narrow-AI disaster unrelated to AGI risk occurs. This disaster is purely accidental (not terrorism or whatever). Its effect is mainly just to cause it to be in the Overton window that a wider variety of serious technical people can talk about scary AI outcomes at all, and maybe it slows timelines by five years or whatever. Also, somehow none of this causes discourse to become even dumber; e.g., people don’t start dismissing AGI risk because “the real risk is narrow AI symptoms like the one we just saw”, and there isn’t a big ML backlash to regulatory/safety efforts, and so on.′
I don’t expect anything at all like that to happen, not least because I suspect we may not have 20+ years left before AGI. But that’s a scenario where I could imagine real, modest improvements. Maybe. Optimistically.
And I can see why convincing anyone this late has less benefit, but maybe it’s still worth it?
I don’t know what you mean by “worth it”. I’m not planning to make a warning shot happen, and would strongly advise that others not do so either. :p
A very late-stage warning shot might help a little, but the whole scenario seems unimportant and ‘not where the action is’ to me. The action is in slow, unsexy earlier work to figure out how to actually align AGI systems (or failing that, how to achieve some non-AGI world-saving technology). Fantasizing about super-unlikely scenarios that wouldn’t even help strikes me as a distraction from figuring out how to make alignment progress today.
I’m much more excited by scenarios like: ‘a new podcast comes out that has top-tier-excellent discussion of AI alignment stuff, it becomes super popular among ML researchers, and the culture norms, and expectations of ML thereby shift such that water-cooler conversations about AGI catastrophe are more serious, substantive, informed, candid, and frequent’.
It’s rare for a big positive cultural shift like that to happen; but it does happen sometimes, and it can result in very fast changes to the Overton window. And since it’s a podcast containing many hours of content, there’s the potential to seed subsequent conversations with a lot of high-quality background thoughts.
By comparison, I’m less excited about individual researchers who explicitly say the words ‘I’ll only work on AGI risk after a catastrophe has happened’. This is a really foolish view to consciously hold, and makes me a lot less optimistic about the relevance and quality of research that will end up produced in the unlikely event that (a) a catastrophe actually happens, and (b) they actually follow through and drop everything they’re working on to do alignment.
I don’t think warning shots are random, but if they have a large impact it may be in unexpected directions, perhaps for butterfly-effect-y reasons.
I’m not defining warning shots that way; I’d be much more surprised to see an event like that happen (because it’s more conjunctive), and I’d be much more confident that a warning shot like that won’t shift us from a very-bad trajectory to an OK one (because I’d expect an event like that to come very shortly before AGI destroys or saves the world, if an event like that happened at all).
When I say ‘warning shot’ I just mean ‘an event where AI is perceived to have caused a very large amount of destruction’.
The warning shots I’d expect to do the most good are ones like:
’20 or 30 or 40 years before we’d naturally reach AGI, a huge narrow-AI disaster unrelated to AGI risk occurs. This disaster is purely accidental (not terrorism or whatever). Its effect is mainly just to cause it to be in the Overton window that a wider variety of serious technical people can talk about scary AI outcomes at all, and maybe it slows timelines by five years or whatever. Also, somehow none of this causes discourse to become even dumber; e.g., people don’t start dismissing AGI risk because “the real risk is narrow AI symptoms like the one we just saw”, and there isn’t a big ML backlash to regulatory/safety efforts, and so on.′
I don’t expect anything at all like that to happen, not least because I suspect we may not have 20+ years left before AGI. But that’s a scenario where I could imagine real, modest improvements. Maybe. Optimistically.
I don’t know what you mean by “worth it”. I’m not planning to make a warning shot happen, and would strongly advise that others not do so either. :p
A very late-stage warning shot might help a little, but the whole scenario seems unimportant and ‘not where the action is’ to me. The action is in slow, unsexy earlier work to figure out how to actually align AGI systems (or failing that, how to achieve some non-AGI world-saving technology). Fantasizing about super-unlikely scenarios that wouldn’t even help strikes me as a distraction from figuring out how to make alignment progress today.
I’m much more excited by scenarios like: ‘a new podcast comes out that has top-tier-excellent discussion of AI alignment stuff, it becomes super popular among ML researchers, and the culture norms, and expectations of ML thereby shift such that water-cooler conversations about AGI catastrophe are more serious, substantive, informed, candid, and frequent’.
It’s rare for a big positive cultural shift like that to happen; but it does happen sometimes, and it can result in very fast changes to the Overton window. And since it’s a podcast containing many hours of content, there’s the potential to seed subsequent conversations with a lot of high-quality background thoughts.
By comparison, I’m less excited about individual researchers who explicitly say the words ‘I’ll only work on AGI risk after a catastrophe has happened’. This is a really foolish view to consciously hold, and makes me a lot less optimistic about the relevance and quality of research that will end up produced in the unlikely event that (a) a catastrophe actually happens, and (b) they actually follow through and drop everything they’re working on to do alignment.