I’d expect AI checks and balances to have some benefits even if AIs are engaged in advanced collusion with each other. For example, AIs rewarded to identify and patch security vulnerabilities would likely reveal at least some genuine security vulnerabilities, even if they were coordinating with other AIs to try to keep the most important ones hidden.
This seems not to be a benefit. What we need is to increase the odds of finding the important vulnerabilities. Collusion that reveals a few genuine vulnerabilities seems likely to lower our odds by giving us misplaced confidence. I don’t think this is an artefact of the example: wherever we’ve unaware of collusion, but otherwise accurate in our assessment of check-and-balance-systems’ capabilities, we’ll be overconfident in the results.
In many cases, the particular vulnerabilities (or equivalent) revealed will be selected at least in part to maximise our overconfidence.
It seems like we could simply try to be as vigilant elsewhere as we would be without this measure, and then we could reasonably expect this measure to be net-beneficial (*how* net beneficial is debatable).
This seems not to be a benefit.
What we need is to increase the odds of finding the important vulnerabilities. Collusion that reveals a few genuine vulnerabilities seems likely to lower our odds by giving us misplaced confidence. I don’t think this is an artefact of the example: wherever we’ve unaware of collusion, but otherwise accurate in our assessment of check-and-balance-systems’ capabilities, we’ll be overconfident in the results.
In many cases, the particular vulnerabilities (or equivalent) revealed will be selected at least in part to maximise our overconfidence.
It seems like we could simply try to be as vigilant elsewhere as we would be without this measure, and then we could reasonably expect this measure to be net-beneficial (*how* net beneficial is debatable).