It does help somewhat, if your strategy is leveraged in ways that involve directing the attention of the cybersecurity field as a whole. It doesn’t help much if your plan is to just hunt for vulnerabilities yourself.
Two things to disclaim. First: we are not within striking distance of making the security of the-internet-as-a-whole able to stand up to a superintelligence. All of the interesting work to be done is in contexts much narrower in scope, like test environments with small API surface area, and AI labs protecting their source code from human actors. And, second: all of the cases where cybersecurity helps wind up bottoming out in buying time for something else, not solving the problem directly.
There are two main scenarios where cybersecurity could wind up mattering.
Scenario 1: The leading lab gets close to the threshold, and tries to pause while they figure out alignment details before they crank up the compute. Some other party steals the source code and launches the unfinished AI prematurely.
Scenario 2: A prototype AGI in the infrahuman range breaks out of a test or training environment. Had it not broken out, its misalignment would have been detected, and the lab that was training/testing it would’ve done something useful with the time left after halting that experiment.
I wrote a bit about scenario 2 in this paper. I think work aimed at addressing this scenario more or less has to be done from inside one of the relevant major AI labs, since their training/test environments are generally pretty bespoke and are kept internal.
I see some people here saying scenario 1 might be hopeless due to human factors, but I think this is probably incorrect. As a proof-of-concept, military R&D is sometimes done in (theoretically) airgapped facilities where employees are searched for USB sticks on the way out. Research addressing scenario 1 probably looks like figuring out how to capture the security benefits of that sort of work environment in a way that’s more practical and less intrusive.
It does help somewhat, if your strategy is leveraged in ways that involve directing the attention of the cybersecurity field as a whole. It doesn’t help much if your plan is to just hunt for vulnerabilities yourself.
Two things to disclaim. First: we are not within striking distance of making the security of the-internet-as-a-whole able to stand up to a superintelligence. All of the interesting work to be done is in contexts much narrower in scope, like test environments with small API surface area, and AI labs protecting their source code from human actors. And, second: all of the cases where cybersecurity helps wind up bottoming out in buying time for something else, not solving the problem directly.
There are two main scenarios where cybersecurity could wind up mattering.
Scenario 1: The leading lab gets close to the threshold, and tries to pause while they figure out alignment details before they crank up the compute. Some other party steals the source code and launches the unfinished AI prematurely.
Scenario 2: A prototype AGI in the infrahuman range breaks out of a test or training environment. Had it not broken out, its misalignment would have been detected, and the lab that was training/testing it would’ve done something useful with the time left after halting that experiment.
I wrote a bit about scenario 2 in this paper. I think work aimed at addressing this scenario more or less has to be done from inside one of the relevant major AI labs, since their training/test environments are generally pretty bespoke and are kept internal.
I see some people here saying scenario 1 might be hopeless due to human factors, but I think this is probably incorrect. As a proof-of-concept, military R&D is sometimes done in (theoretically) airgapped facilities where employees are searched for USB sticks on the way out. Research addressing scenario 1 probably looks like figuring out how to capture the security benefits of that sort of work environment in a way that’s more practical and less intrusive.