This is because we should expect in the worst-case scenario that AGI will be trained on the whole Internet, including any online discussion of our interpretability tools, security reserach, and so on. This is information that the AGI can use against us (e.g., by using our interpretability tools against us, to hack, deceive, or otherwise socially engineer the alignment researchers).
Security through obscurity can buy us more chances at aligning/retraining the AGI before it escapes into the Internet. We should keep our battle plans close to our chest, instead of posting it online for the AGI to see.
I suppose that’s an additional consideration. Keeping potentially concerning material out of trivially scraped training sets is pretty low cost and worth it.
I wouldn’t want to sacrifice much usability beyond the standard security measures to focus on that angle, though; that would mean trying to directly fight a threat which is 1. already able to misuse observed research, 2. already able to otherwise socially or technically engineer its way to gaining access to that research, and 3. somehow not already massively lethal without that research.
In general, it is much easier to keep potentially concerning material out of the AGI’s training set if it’s already a secret rather than something that’s been published on the Internet. This is because there may be copies, references, and discussions of the material elsewhere in the training set that we fail to catch.
If it’s already posted on the Internet and it’s too late, we should of course still try our best to keep it out of the training set.
As for the question of “should we give up on security after AGI attains high capabilities?” we shouldn’t give up as long as our preparation could non-negligibly increase our probability of escaping doom, even if the probability increase is small. We should always maximize expected utility, even if we are probably doomed.
This is an excellent idea. An encrypted, airgapped, or paper library that coordinates between AI researchers seems crucial for AGI safety.
This is because we should expect in the worst-case scenario that AGI will be trained on the whole Internet, including any online discussion of our interpretability tools, security reserach, and so on. This is information that the AGI can use against us (e.g., by using our interpretability tools against us, to hack, deceive, or otherwise socially engineer the alignment researchers).
Security through obscurity can buy us more chances at aligning/retraining the AGI before it escapes into the Internet. We should keep our battle plans close to our chest, instead of posting it online for the AGI to see.
I suppose that’s an additional consideration. Keeping potentially concerning material out of trivially scraped training sets is pretty low cost and worth it.
I wouldn’t want to sacrifice much usability beyond the standard security measures to focus on that angle, though; that would mean trying to directly fight a threat which is 1. already able to misuse observed research, 2. already able to otherwise socially or technically engineer its way to gaining access to that research, and 3. somehow not already massively lethal without that research.
In general, it is much easier to keep potentially concerning material out of the AGI’s training set if it’s already a secret rather than something that’s been published on the Internet. This is because there may be copies, references, and discussions of the material elsewhere in the training set that we fail to catch.
If it’s already posted on the Internet and it’s too late, we should of course still try our best to keep it out of the training set.
As for the question of “should we give up on security after AGI attains high capabilities?” we shouldn’t give up as long as our preparation could non-negligibly increase our probability of escaping doom, even if the probability increase is small. We should always maximize expected utility, even if we are probably doomed.