I expect one or two insiders wouldn’t be enough; that the actual technical implementation of the AI’s target will require coordination with, say, a dozen people at different stages of the process, or at least that the target will be visible/verifiable by many people at different stages of the process. And if the people in charge actually understand the stakes, it’ll probably be then cross-reviewed by a different group entirely, before being deployed.
There’s a limit on how much siloing and information security a state sponsored AGI research team can afford when it’s competing with teams like DeepMind that don’t necessarily have to operate under the same constraints. My guess is that there will be a dozen or so managers and system administrators capable of doing this sort of thing by default. If team members are incapable of modifying and understanding their AGI systems individually how are they supposed to keep up with the leading edge?
It’d still be possible to steal the lightcone unilaterally/with 1-2 collaborators, but it’d require defeating security measures built specifically against this sort of thing. I. e. the rogue actor would need to be (1) someone in the position on the project with the skills to code-in a different target, (2) willing to defy orders/ideology/procedure head-on like this, and (3) competent at conspiracy.
It seems much more likely that #2 and #3 will end up being satisfied by leading AI researchers to me, than by base rate military and police officials. Not because being a genius automatically makes you a nonconformist, but because there is much less slack and data available for filtering AI researchers for loyalty than when e.g. appointing a new FSB chief. And the more these governments reject Von Neumanns for less risky applicants the harder it will be to compete.
There’s a limit on how much siloing and information security a state sponsored AGI research team can afford when it’s competing with teams like DeepMind that don’t necessarily have to operate under the same constraints
Mm, I’m not sure we’re talking about the same problem? I’m saying that a lot of people will have read-access, and each of them would be able to notice that there’s something very wrong with the AI’s target, and then they’ll need not to raise a fuss about it, for the conspiracy to succeed.
there is much less slack available for selecting AI researchers for loyalty
In which case there’ll be more oversight over them, no?
In addition, the leading engineers won’t necessarily need to be genius-level. The people doing foundational alignment research would need to be, but if we’re in a hypothetical where we have alignment tools good enough to avoid omnicide, we’re past the stage where theory was the bottleneck. In that world, verifying the AI’s preferences should be trivial for only normally-competent ML engineers.
Which you’d then select for loyalty and put in the oversight team.
There’s a limit on how much siloing and information security a state sponsored AGI research team can afford when it’s competing with teams like DeepMind that don’t necessarily have to operate under the same constraints. My guess is that there will be a dozen or so managers and system administrators capable of doing this sort of thing by default. If team members are incapable of modifying and understanding their AGI systems individually how are they supposed to keep up with the leading edge?
It seems much more likely that #2 and #3 will end up being satisfied by leading AI researchers to me, than by base rate military and police officials. Not because being a genius automatically makes you a nonconformist, but because there is much less slack and data available for filtering AI researchers for loyalty than when e.g. appointing a new FSB chief. And the more these governments reject Von Neumanns for less risky applicants the harder it will be to compete.
Mm, I’m not sure we’re talking about the same problem? I’m saying that a lot of people will have read-access, and each of them would be able to notice that there’s something very wrong with the AI’s target, and then they’ll need not to raise a fuss about it, for the conspiracy to succeed.
In which case there’ll be more oversight over them, no?
In addition, the leading engineers won’t necessarily need to be genius-level. The people doing foundational alignment research would need to be, but if we’re in a hypothetical where we have alignment tools good enough to avoid omnicide, we’re past the stage where theory was the bottleneck. In that world, verifying the AI’s preferences should be trivial for only normally-competent ML engineers.
Which you’d then select for loyalty and put in the oversight team.