Thanks! I agree with this critique. Note that Daniel also points out something similar in point 12 of his comment — see my response.
To elaborate a bit more on the “missing step” problem though:
I suspect many of the most plausible risk models have features that make it undesirable for them to be shared too widely. Please feel free to DM me if you’d like to chat more about this.
There will always be some point between Step 1 and Step 3 at which human-legible explanations fail. i.e., it would be extremely surprising if we could tell a coherent story about the whole process — the best we can do is assume the AI gets to the end state because it’s highly competent, but we should expect it to do things we can’t understand. (To be clear, I don’t think this is quite what your comment was about. But it is a fundamental reason why we can’t ever expect a complete explanation.)
Is it something like the AI-box argument? “If I share my AI breakout strategy, people will think ‘I just won’t fall for that strategy’ instead of noticing the general problem that there are strategies they didn’t think of”? I’m not a huge fan of that idea, but I won’t argue it further.
I’m not expecting a complete explanation, but I’d like to see a story that doesn’t skip directly to “AI can reformat reality at will” without at least one intermediate step. Like, this is the third time I’ve seen an author pull this trick and I’m starting to wonder if the real AI-safety strategy is “make sure nobody invents grey-goo nanotech.”
If you have a ball of nanomachines that can take over the world faster than anyone can react to it, it doesn’t really matter if it’s an AI or a human at the controls, as soon as it’s invented everyone dies. It’s not so much an AI-risk problem as it is a problem with technological progress in general. (Fortunately, I think it’s still up for debate whether it’s even possible to create grey-goo-style nanotech.)
I see — perhaps I did misinterpret your earlier comment. It sounds like the transition you are more interested in is closer to (AI has ~free rein over the internet) ⇒ (AI invents nanotech). I don’t think this is a step we should expect to be able to model especially well, but the best story/analogy I know of for it is probably the end part of That Alien Message. i.e., what sorts of approaches would we come up with, if all of human civilization was bent on solving the equivalent problem from our point of view?
If instead you’re thinking more about a transition like (AI is superintelligent but in a box) ⇒ (AI has ~free rein over the internet), then I’d say that I’d expect us to skip the “in a box” step entirely.
I really don’t get how you can go from being online to having a ball of nanomachines, truly. Imagine AI goes rogue today. I can’t imagine one plausible scenario where it can take out humanity without triggering any bells on the way, even without anyone paying attention to such things. But we should pay attention to the bells, and for that we need to think of them. What the signs might look like? I think it’s really, really counterproductive to not take that into account at all and thinking all is lost if it fooms. It’s not lost. It will need humans, infrastructure, money (which is very controllable) to accomplish its goals. Governments already pay a lot of attention to their adversaries who are trying to do similar things and counteract them semi-successfully. Any reason why they can’t do the same to a very intelligent AI? Mind you, if your answer is to simulate and just do what it takes, true to life simulations will take a lot of compute and time; that won’t be available from the start. We should stop thinking of rogue AI as God, it would only help it accomplish it’s goals.
Thanks! I agree with this critique. Note that Daniel also points out something similar in point 12 of his comment — see my response.
To elaborate a bit more on the “missing step” problem though:
I suspect many of the most plausible risk models have features that make it undesirable for them to be shared too widely. Please feel free to DM me if you’d like to chat more about this.
There will always be some point between Step 1 and Step 3 at which human-legible explanations fail. i.e., it would be extremely surprising if we could tell a coherent story about the whole process — the best we can do is assume the AI gets to the end state because it’s highly competent, but we should expect it to do things we can’t understand. (To be clear, I don’t think this is quite what your comment was about. But it is a fundamental reason why we can’t ever expect a complete explanation.)
Is it something like the AI-box argument? “If I share my AI breakout strategy, people will think ‘I just won’t fall for that strategy’ instead of noticing the general problem that there are strategies they didn’t think of”? I’m not a huge fan of that idea, but I won’t argue it further.
I’m not expecting a complete explanation, but I’d like to see a story that doesn’t skip directly to “AI can reformat reality at will” without at least one intermediate step. Like, this is the third time I’ve seen an author pull this trick and I’m starting to wonder if the real AI-safety strategy is “make sure nobody invents grey-goo nanotech.”
If you have a ball of nanomachines that can take over the world faster than anyone can react to it, it doesn’t really matter if it’s an AI or a human at the controls, as soon as it’s invented everyone dies. It’s not so much an AI-risk problem as it is a problem with technological progress in general. (Fortunately, I think it’s still up for debate whether it’s even possible to create grey-goo-style nanotech.)
I see — perhaps I did misinterpret your earlier comment. It sounds like the transition you are more interested in is closer to (AI has ~free rein over the internet) ⇒ (AI invents nanotech). I don’t think this is a step we should expect to be able to model especially well, but the best story/analogy I know of for it is probably the end part of That Alien Message. i.e., what sorts of approaches would we come up with, if all of human civilization was bent on solving the equivalent problem from our point of view?
If instead you’re thinking more about a transition like (AI is superintelligent but in a box) ⇒ (AI has ~free rein over the internet), then I’d say that I’d expect us to skip the “in a box” step entirely.
I really don’t get how you can go from being online to having a ball of nanomachines, truly.
Imagine AI goes rogue today. I can’t imagine one plausible scenario where it can take out humanity without triggering any bells on the way, even without anyone paying attention to such things.
But we should pay attention to the bells, and for that we need to think of them. What the signs might look like?
I think it’s really, really counterproductive to not take that into account at all and thinking all is lost if it fooms. It’s not lost.
It will need humans, infrastructure, money (which is very controllable) to accomplish its goals. Governments already pay a lot of attention to their adversaries who are trying to do similar things and counteract them semi-successfully. Any reason why they can’t do the same to a very intelligent AI?
Mind you, if your answer is to simulate and just do what it takes, true to life simulations will take a lot of compute and time; that won’t be available from the start.
We should stop thinking of rogue AI as God, it would only help it accomplish it’s goals.