Steve_Omohundro comments on Limitations on Formal Verification for AI Safety

Steve_Omohundro 8 Sep 2024 16:33 UTC
8 points
1
People seem to be getting the implication we intended backwards. We’re certainly not saying “For any random safety property you might want, you can use formal methods to magically find the rules and guarantee them!” What we are saying is “If you have a system which actually guarantees a safety property, then there is formal proof of that and there are many many benefits to making that explicit, if it is practical.” We’re not proposing any new physics, any new mathematics, or even any new AI capabilities other than further development of current capabilities in theorem proving, verified software synthesis, and autoformalization.
Humanity is in a crisis right now! Even without AI we have disasters like the $5 billion CrowdStrike flaw a few weeks ago, many many cyberattacks disabling critical systems, Boeing airplanes falling apart in the sky, etc. As open source AI cyberattack models advance, every one of today’s flaws and security holes are likely to be exploited by a wide variety of malicious actors. We have clear descriptions of desired safety properties for many of today’s issues and we have design rules which have been developed to mitgate the problems. But today’s software development and engineering practices aren’t leading to the safety we need! Software engineers wing it and introduce flaws in their systems and security holes are left unfixed for decades. Boeing outsources the manufacture of jet components to underpaid and unsupervised third parties.
By creating explicit formal representations of these rules, and formal “digital twins” of our systems, we can obtain guarantees that they are being followed. Our papers and my talks describe many techniques for doing this. But we are just scratching the surface. We need many many more people thinking about how to achieve actual safety in the face of today’s and tomorrow’s AIs.
I’m a little bit shocked at the low level of curiosity and creativity I’m seeing in these discussions. I have not seen any other proposals for actually dealing with the actual problems that real AI systems are likely to cause in the next few years. Attempts at alignment, red teaming, etc. are all very interesting and valuable but do nothing to address the hundreds of millions of open source AIs which are almost as powerful as the frontier closed models and which are easily fine tuned for cyberattack and other harmful actions.
The general approach we have outlined is not optional! Godel’s completeness theorem tells us that there is a formal proof of any safety property which actually holds for a system. For complex systems, they are very unlikely to be accidentally safe. So designing and building them with explicit formal models will be necessary. If humans don’t do it, then powerful AIs certainly will. But, in that case, we may have very little input into exactly what properties will be guaranteed.
Our proposal is not simple! But it doesn’t rely on any exotic new ideas. It does require creativity and dedication to actually work to ensure that humanity survives the next decade. There would be tremendous benefit in simply precisely encoding today’s safety criteria and in creating systems that guarantee that those criteria are followed. With some future creativity we can go further and use these techniques to develop much more accurate safety criteria and better designs. I would have expected huge interest and forward movement on this. I’m sad to not see much of that so far.