David, I really like that your description discusses the multiple interacting levels involved in self-driving cars (eg. software, hardware, road rules, laws, etc.). Actual safety requires reasoning about those interacting levels and ensuring that the system as a whole doesn’t have vulnerabilities. Malicious humans and AIs are likely to try to exploit systems in unfamiliar ways. For example, here are 10 potential harmful actions (out of many many more possibilities!) that AI-enabled malicious humans or AIs could gain benefits from which involve controlling self-driving cars in harmful ways:
1) Murder for hire: Cause accidents that kill a car’s occupant, another car’s occupant, or pedestrians
2) Terror for hire: Cause accidents that injure groups of pedestrians
3) Terror for hire: Control vehicles to deliver explosives or disperse pathogens
4) Extortion: Demand payment to not destroy a car, harm the driver, or harm others
5) Steal delivery contents: Take over delivery vehicles to steal their contents
6) Steal car components: Take over cars to steal components like catalytic converters
7) Surreptitious delivery or taxis: Control vehicles to earn money for illegal deliveries or use in illegal activities
8) Insurance fraud for hire: Cause damage to property for insurance fraud
9) Societal distraction for hire: Cause multiple crashes to overwhelm the societal response
10) Blocked roads for hire: Control multiple vehicles to block roadways for social harm or extortion
To prevent these and many related harms, safety analyses must integrate analyses on multiple levels. We need formal models of the relevant dynamics of each level and proof of adherence to overall safety criteria. For societal trust in the actual safety of a self-driving vehicle, I believe manufacturers will need to provide machine-checkable “certificates” of these analyses.
Thanks Nora for an excellent response! With R1 and o3, I think we are rapidly moving to the point where AI theorem proving, autoformalization, and verified software and hardware synthesis will be powerful and widely available. I believe it is critically important for many more people to understand the importance of formal methods for AI Safety.
Computer science, scientific computing, cryptography, and other fields have repeatedly had “formal methods skeptics” who argue for “sloppy engineering” rather than formal verification. This has been a challenge ever since Turing proved the first properties of programs back in 1949. This hit a tragic peak in 1979 when three leading computer scientists wrote a CACM paper arguing against formal methods in computer science stating that “It is felt that ease of formal verification should not dominate program language design” and set back computer science by decades: “Social processes and proofs of theorems and programs” https://dl.acm.org/doi/10.1145/359104.359106 And we have seen the disastrous consequences of that attitude over the past 45 years. Here’s a 2002 comment on those wars: “The Proof of Correctness Wars” https://cacm.acm.org/practice/the-proof-of-correctness-wars/
I only just saw the “Formal Methods are Overrated” video and tried to post a response both on YouTube and LinkedIn but those platforms don’t allow comments of more than a paragraph. So let me post it here as an addition to your fine post:
I am a co-author on two of the papers you mentioned (“Toward Guaranteed Safe AI” https://arxiv.org/abs/2405.06624 and “Provably Safe AI” https://arxiv.org/abs/2309.01933 ). Formal verification is critically important for AI safety, but with a reversed implication from what you described. It’s not that formal verification suddenly makes safety and security problems easy. It’s that if a complex system *doesn’t* have a formal proof of safety properties, then it is very likely to be unsafe. And as AI systems become more powerful, they are likely to find every safety or security vulnerability and exploit it. This point was clearly explained in the paper with Max. Contrary to your video, formal methods are *underrated* and *any* approach to AI safety and security which is effective against powerful AIs must use it. For some reason, ever since Turing’s 1949 demonstration of proofs of program properties, so-called “experts” have opted for “sloppy” design rather than creating verified designs using humanity’s most powerful tool: mathematics.
Humaniy’s unwillingness to step up and create provably correct technology is why we are in the sorry state we’re in with incredibly buggy and insecure infrastructure. Fortunately, new AI like OpenAI’s o3 model appear to be super-human in mathematics and perhaps autoformalization, theorem proving, and verified software synthesis and may usher in a new era of verified software, hardware, and social protocols.
The points you made don’t affect the importance of formal proof for AI safety. Your first point was that “weights are intractable” in complex neural nets. There is a fair amount of work on formally verifying properties of simpler neural nets. For example, the book “Introduction to Neural Network Verification” https://arxiv.org/abs/2109.10317 has a nice treatment of finding provable intervals, zonotopes, and polyhedra bounding the behavior of networks. But for the systems of current interest, like transformers, I agree that this approach isn’t likely to be practical. That’s why in those two papers we focus on “gatekeepers” which sit between complex neural systems and the actions they take in the world. This VAISU talk I gave summarizes the approach: “Provably Safe AI – Steve Omohundro” https://www.youtube.com/watch?v=7ON6QSnHCm4&ab_channel=HorizonEvents The complicated LLM or other AI system can do whatever computations it wants but it must generate both a solution to a problem and a proof that that solution meets the spec. If it is controlling a physical system, that spec would include safety criteria that the solution must meet. The gatekeeper checks the proof (a simple and cheap operation) and only allows the solution to affect the world if the proof is verified.
Your second point is that “reality is complicated + unknown”. Yes, but here we are talking about *designing* systems in which we have high confidence of safety. Every engineering discipline already deals with the complexity of the world and has developed techniques and models which simplify the problems to the point where humans can solve them. Semiconductor specialists create design rules which provide strong guarantees that their chips behave as desired. For security, we need a model of the adversary. If the adversary can only attack through the designed digital channels, we only need formal proofs of security at the software level. Physical attacks require models of the physical device and proofs of security against a class of adversary which goes beyond today’s common practice. You mentioned our example of DNA synthesis machines. We certainly weren’t proposing some kind of automatic system to determine which DNA sequences are safe for humans. Rather, we assume a “whitelist” of allowed safe sequences (eg. determined by a bio safety board). The safety challenge was the much simpler task of ensuring that rogue AIs cannot synthesize unsafe sequences which are not on the whitelist. Today’s DNA synthesis machines allow anybody to synthesize anything. We argue that as AI becomes extremely powerful we need a “gatekeeper” between the AI and the synthesis machine which checks for allowed sequences. How to do that in a way which can’t be physically subverted is described in my video above.
Finally, you argue that “‘tool AI’ is unstable + uncompetitive”. What Max Tegmark and others are saying here is that we don’t need AI systems with uncontrolled *agency*. That is, they don’t need their own goals and the ability to take arbitrary actions in the world to achieve those goals. Any problem that humans want solved, can be just as well solved by a powerful AI which only accepts formal problem specs and provides solutions and formal proofs of those solutions. There is no need to eliminate machine learning! But we absolutely must prevent AIs from running rampant over the human infrastructure. You suggest that it would be “unstable” in the sense that it’s trivial to convert a powerful tool into a powerful agent. My talks and writings are exactly about that and formal methods are a critical aspect. It’s a simpler problem if powerful AIs require trillion dollar data centers which can be constrained by government regulation. Unfortunately, recent results like DeepSeek’s R1 or Microsoft’s “rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking” https://arxiv.org/abs/2501.04519 suggest that even tiny models running on cell phones may be powerful enough to cause harm. We need to create a trustable infrastructure which cannot be subverted by powerful AI using the techniques I describe in my talks. We are in a race between using AI for capabilities and using it for safety. Formal methods are a critical tool for creating the critical trustable infrastructure.