redlizard comments on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

redlizard 23 Jun 2022 3:51 UTC
3 points

I would say that some formal proofs are actually impossible

Plausible. In the aftermath of spectre and meltdown I spent a fair amount of time thinking on how you could formally prove a piece of software to be free of information-leaking side channels, even assuming that the same thing holds for all dependent components such as underlying processors and operating systems and the like, and got mostly nowhere.

In fact, I think survival timelines might even require anyone who might be working on classes of software reliability that don’t relate to alignment to actually switch their focus to alignment at this point.

Does that include those working on software correctness and reliability in general, without a security focus? I would expect better tools for making software that is free of bugs, such as programs that include correctness proofs as well as some of the lesser formal methods, to be on the critical path to survival—for the simple reason that any number of mundane programming mistakes in a supposedly-aligned AI could easily kill us all. I was under the impression that you agree with this [“Assurance Requires Formal Proofs”]. I expect formal proofs of security in particular to be largely a corollary of this—a C program that is proven to correctly accomplish any particular goal will necessarily not have any buffer overflows in it, for this would invoke undefined behavior which would make your proof not go through. This does not necessarily apply to all security properties, but I would expect it to apply to most of them.