One assumption that could be used to defuse the danger is if we can apply the controlled AIs to massively improve computer security such that computer security wins the race over attacking the computers.
This is at least semi-plausible for the first AIs, who will almost certainly be wildly superhuman at both coding and mathematically proving theorems in for example Lean, because there’s a very obvious path to how to get the training data to bootstrap yourself, and like go, it is relatively easy to verify that your solution actually works.
Another assumption that could work for AI control is that once the AIs are controlled enough, labs start using the human-level AIs to further enhance control/alignment strategies, and thus from the base case we can inductively show that the next level up of AIs are even better controlled until you reach a limit, and that the capabilities that are done are there to make the AI safer, not more dangerous.
Improving computer security seems possible but there are many other attack vectors. For instance, even if an A.I. can prove a system’s software is secure, it may choose to introduce social engineering style back doors if it is not aligned. It’s true that controlled A.I.’s can be used to harden society but overall I don’t find that strategy comforting.
I’m not convinced that this induction argument goes through. I think it fails on the first generation that is smarter than humans, for basically Yudkowskian reasons.
One assumption that could be used to defuse the danger is if we can apply the controlled AIs to massively improve computer security such that computer security wins the race over attacking the computers.
This is at least semi-plausible for the first AIs, who will almost certainly be wildly superhuman at both coding and mathematically proving theorems in for example Lean, because there’s a very obvious path to how to get the training data to bootstrap yourself, and like go, it is relatively easy to verify that your solution actually works.
https://www.lesswrong.com/posts/2wxufQWK8rXcDGbyL/access-to-powerful-ai-might-make-computer-security-radically
Another assumption that could work for AI control is that once the AIs are controlled enough, labs start using the human-level AIs to further enhance control/alignment strategies, and thus from the base case we can inductively show that the next level up of AIs are even better controlled until you reach a limit, and that the capabilities that are done are there to make the AI safer, not more dangerous.
Improving computer security seems possible but there are many other attack vectors. For instance, even if an A.I. can prove a system’s software is secure, it may choose to introduce social engineering style back doors if it is not aligned. It’s true that controlled A.I.’s can be used to harden society but overall I don’t find that strategy comforting.
I’m not convinced that this induction argument goes through. I think it fails on the first generation that is smarter than humans, for basically Yudkowskian reasons.