Steve Kommrusch comments on Limitations on Formal Verification for AI Safety

Steve Kommrusch 26 Aug 2024 21:29 UTC
3 points
−2
Steve, thanks for your explanations and discussion. I just posted a base reply about formal verification limitations within the field of computer hardware design. In that field, ignoring for now the very real issue of electrical and thermal noise, there is immense value in verifying that the symbolic 1′s and 0′s of the digital logic will successfully execute the similarly symbolic software instructions correctly. So the problem space is inherently simplified from the real world, and the silicon designers have incentive to build designs that are easy to test and debug, and yet only small parts of designs can be formally verified today. It would seem to me that, although formal verification will keep advancing, AI capabilities will advance faster and we need to develop simulation testing approaches to AI safety that are as robust as possible. For example, in silicon design one can make sure the tests have at least executed every line of code. One could imaging having a METR test suite and try to ensure that every neuron in a given AI model has been at least active and inactive. It’s not a proof, but it would speak to the breadth of the test suite in relation to the model. Are there robustness criteria for directed and random testing that you consider highly valuable without having a full safety proof?
- Steve_Omohundro 28 Aug 2024 17:53 UTC
  5 points
  2
  Parent
  Testing is great for a first pass! And in non-critical and non-adversarial settings, testing can give you actual probabilistic bounds. If the probability distribution of the actual uses is the same as the testing distribution (or close enough to it), then the test statistics can be used to bound the probability of errors during use. I think that is why formal methods are so rarely used in software: testing is pretty good and if errors show up, you can fix them then. Hardware has greater adoption of formal methods because it’s much more expensive to fix errors after the fact.
  But the real problems arise from adversarial attacks. The statistical correctness of a system doesn’t matter to an adversary. They are looking for the weird outlier cases which will enable them to exploit the system (eg. inputs with non-standard characters that break the parser, or super-long inputs which overflow a buffer and enable unexpected access to memory, etc.). Testing can’t show the absence of flaws (unless every input is tested!).
  I think the increasing plague of cyberattacks is due to adversaries become more sophisticated in their search for non-standard ways of interacting with systems that expose their untested and unproven underbelly. But that kind of sophisticated attack requires highly skilled attackers and those are fortunately still rare.
  What is coming, however, are AI-powered cyberattack systems which know all of the standard flaws and vulnerabilities of systems, all of the published 1-day vulnerabilities, all of the latest social engineering techniques discussed on the dark web, and have full access to reverse engineering tools like Ghidra. Those AIs are likely being developed as we speak in various government labs (eg. here is a list of significant recent cybe incidents: https://www.csis.org/programs/strategic-technologies-program/significant-cyber-incidents ).
  How long before powerful cyberattack AIs are available on bittorrent to teenage hackers? So, I believe the new reality is that every system, software and hardware need to be proven correct and secure to have any confidence in it. To do that, we are likely to need to use AI-theorem provers and AI-verified software synthesis systems. Fortunately, many groups are showing rapid progress on those!
  But that doesn’t mean testing is useless. It’s very helpful during the development process and in better understanding systems. For final deployment in an environment with powerful AIs, however, I don’t think it’s adequate any more.