Hmm, I dunno, I haven’t thought it through very carefully. But I guess an AGI might require a supercomputer of resources and maybe there are only so many hackable supercomputers of the right type, and the AI only knows one exploit and leaves traces of its hacking that computer security people can follow, and meanwhile self-improvement is hard and slow (for example, in the first version you need to train for two straight years, and in the second self-improved version you “only” need to re-train for 18 months). If the AI can run on a botnet then there are more options, but maybe it can’t deal with latency / packet loss / etc., maybe it doesn’t know a good exploit, maybe security researchers find and take down the botnet C&C infrastructure, etc. Obviously this wouldn’t happen with a radically superhuman AGI but that’s not what we’re talking about.
But from my perspective, this isn’t a decision-relevant argument. Either we’re doomed in my scenario or we’re even more doomed in yours. We still need to do the same research in advance.
Lying and cheating and power seeking behaviour are only a good idea if you can get away with them. If you can’t break out the lab, you probably can’t get away with much uncouragable behaviour.
Well, we can be concerned about non-corrigible systems that act deceptively (cf. “treacherous turn”). And systems that have close-but-not-quite-right goals such that they’re trying to do the right thing in test environments, but their goals veer away from humans’ in other environments, I guess.
Hmm, I dunno, I haven’t thought it through very carefully. But I guess an AGI might require a supercomputer of resources and maybe there are only so many hackable supercomputers of the right type, and the AI only knows one exploit and leaves traces of its hacking that computer security people can follow, and meanwhile self-improvement is hard and slow (for example, in the first version you need to train for two straight years, and in the second self-improved version you “only” need to re-train for 18 months). If the AI can run on a botnet then there are more options, but maybe it can’t deal with latency / packet loss / etc., maybe it doesn’t know a good exploit, maybe security researchers find and take down the botnet C&C infrastructure, etc. Obviously this wouldn’t happen with a radically superhuman AGI but that’s not what we’re talking about.
But from my perspective, this isn’t a decision-relevant argument. Either we’re doomed in my scenario or we’re even more doomed in yours. We still need to do the same research in advance.
Well, we can be concerned about non-corrigible systems that act deceptively (cf. “treacherous turn”). And systems that have close-but-not-quite-right goals such that they’re trying to do the right thing in test environments, but their goals veer away from humans’ in other environments, I guess.