There should maybe exist an org whose purpose it is to do penetration testing on various ways an AI might illicitly gather power. If there are vulnerabilities, these should be disclosed with the relevant organizations.
For example: if a bank doesn’t want AIs to be able to sign up for an account, the pen-testing org could use a scaffolded AI to check if this is currently possible. If the bank’s sign-up systems are not protected from AIs, the bank should know so they can fix the problem.
One pro of this approach is that it can be done at scale: it’s pretty trivial to spin up thousands AI instances in parallel to try to attempt to do things they shouldn’t be able to do. Humans would probably need to inspect the final outputs to verify successful attempts, but the vast majority of the work could be automated.
One hope of this approach is that if we are able to patch up many vulnerabilities, then it could be meaningfully harder for a misused or misaligned AI to gain power or access resources that they’re not supposed to be able to access. I’d guess this doesn’t help much in the superintelligent regime though.
There should maybe exist an org whose purpose it is to do penetration testing on various ways an AI might illicitly gather power. If there are vulnerabilities, these should be disclosed with the relevant organizations.
For example: if a bank doesn’t want AIs to be able to sign up for an account, the pen-testing org could use a scaffolded AI to check if this is currently possible. If the bank’s sign-up systems are not protected from AIs, the bank should know so they can fix the problem.
One pro of this approach is that it can be done at scale: it’s pretty trivial to spin up thousands AI instances in parallel to try to attempt to do things they shouldn’t be able to do. Humans would probably need to inspect the final outputs to verify successful attempts, but the vast majority of the work could be automated.
One hope of this approach is that if we are able to patch up many vulnerabilities, then it could be meaningfully harder for a misused or misaligned AI to gain power or access resources that they’re not supposed to be able to access. I’d guess this doesn’t help much in the superintelligent regime though.
there are many such orgs, they’re commonly known as fraudsters and scammers
Problem with scammers is that they do not report successful penetration of defense.