Steve_Omohundro comments on Limitations on Formal Verification for AI Safety

Steve_Omohundro 20 Aug 2024 20:33 UTC
11 points
2
On one hand, while we should recognize that modeling techniques like discrete element analysis can produce quantitative estimates of real-world behavior – for example, how likely a drone is to crash, or how likely a bridge is to fail – we should not lose sight of the fact that such estimates are invariably just estimates and not guarantees. Additionally, from a practical standpoint, estimates of this sort for real-world systems most-often tend to be based on empirical studies around past results rather than prospective modeling. And to the extent that estimates are ever given prospectively about real-world systems, they are invariably considered to be estimates, not guarantees.
I think this is a critical point. Every engineering discipline has precise models of their domain. For example, I’m looking at Peter Childs’ “Mechanical Design Engineering Handbook” which surveys the field and is full of detailed partial differential equations for the behavior of different components like bearings, gears, clutches, seals, springs, fasteners, etc. It also has many tables showing the values (with significant figures) of parameters like viscosity of lubricants and the fatigue load of ball bearings.
How did they get these models and parameters? It’s a combination of inference from fundamental physics and experimental tests. Given more detailed physics understanding, fewer experiments are needed. And even with no physics, probabilistic models with PAC-like guarantees can be built purely from experiments if the test distribution is the same as the training distribution.
To build safe systems, engineers do computations on these models and bound the probability of system failure (hopefully to a low value!). These computations should be purely mathematical with precise answers verified by mathematical proofs. Unfortunately, that has not been the culture in “scientific computing” or “numerical methods” in the United States and often suggested the need for “numerical analysts” to look over your code and ensure that it was “numerically stable”!
In my book, that culture has been a disaster, leading to many failures and deaths. The Europeans developed a culture of “verified scientific computing” (for example, I was enamored of this book: “Numerical Toolbox for Verified Computing I: Basic Numerical Problems Theory, Algorithms, and Pascal-XSC Programs (Springer Series in Computational Mathematics)” which shows how to produce provably correct enclosures around solutions to equations, finding global optima, etc. These days these models can be formalized in a system like Lean and properties deduced using mathematical proof.
How much accuracy do we need and what should we do about incorrect models? For safety, the answer depends on the strengths of the adversary. If your adversary is a typical human, then often weak models suffice. Powerful AI adversaries, however, will look for weaknesses in the desing model and try to exploit them. To have any hope of providing actual security, these processes must be modeled and accounted for.
Most engineering disciplines actually have a kind of “stack” of models, each layer justified by the layer below. For example, in chip design there is a digital circuit layer on top of a physical layout layer on top of detailed electrodynamics and solid state materials, etc. Today’s chips do a pretty good job of isolating the layers, so a digital circuit laid out according to the chip process node’s design rules will actually implement the intended digital behavior.
But attacks like “Rowhammer” show what can happen when an adversary can affect a layer below the digital layer. By repeatedly accessing certain memory cells, an attacker can flip nearby bits and use this to extract cryptographic keys and violate security “guarantees”. As DRAMs have gotten denser, the problem has gotten worse. The human-designed mitigation solutions have led to a cat and mouse game that the attackers are winning. I believe that’s a great example of a critically important design problem which can only be solved by using AI and formal methods. The ideal solution would be to design chips which provably implement the intended digital model. But we have a lot of bad DRAM chips out there! I believe we need a formal electrodynamic model of the phenomenon and a provable representation of all access patterns which lead to violations. Then we need verified AI program synthesis to resynthesize programs to provably not generate any of the bad access patterns. I don’t think any other technique has a chance of dealing with that kind of issue. And you can be sure that advanced AI’s will be hammering away at weak computational infrastructure.