redlizard comments on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

redlizard 22 Jun 2022 19:41 UTC
9 points
Yes, I agree with this.

I cannot judge to what degree I agree with your strategic assessment of this technique, though. I interpreted your top-level post as judging that assurances based on formal proofs are realistically out of reach as a practical approach; whereas my own assessment is that making proven-correct [and therefore proven-secure] software a practical reality is a considerably less impossible problem than many other aspects of AI alignment, and indeed one I anticipate to actually happen in a timeline in which aligned AI materializes.
- elspood 23 Jun 2022 2:00 UTC
  5 points
  Parent
  I would say that some formal proofs are actually impossible, but would agree that software with many (or even all) of the security properties we want could actually have formal-proof guarantees. I could even see a path to many of these proofs today.
  While the intent of my post was to draw parallel lessons from software security, I actually think alignment is an oblique or orthogonal problem in many ways. I could imagine timelines in which alignment gets ‘solved’ before software security. In fact, I think survival timelines might even require anyone who might be working on classes of software reliability that don’t relate to alignment to actually switch their focus to alignment at this point.
  Software security is important, but I don’t think it’s on the critical path to survival unless somehow it is a key defense against takeoff. Certainly many imagined takeoff scenarios are made easier if an AI can exploit available computing, but I think the ability to exploit physics would grant more than enough escape potential.
  - redlizard 23 Jun 2022 3:51 UTC
    3 points
    Parent
    
    I would say that some formal proofs are actually impossible
    
    Plausible. In the aftermath of spectre and meltdown I spent a fair amount of time thinking on how you could formally prove a piece of software to be free of information-leaking side channels, even assuming that the same thing holds for all dependent components such as underlying processors and operating systems and the like, and got mostly nowhere.
    
    In fact, I think survival timelines might even require anyone who might be working on classes of software reliability that don’t relate to alignment to actually switch their focus to alignment at this point.
    
    Does that include those working on software correctness and reliability in general, without a security focus? I would expect better tools for making software that is free of bugs, such as programs that include correctness proofs as well as some of the lesser formal methods, to be on the critical path to survival—for the simple reason that any number of mundane programming mistakes in a supposedly-aligned AI could easily kill us all. I was under the impression that you agree with this [“Assurance Requires Formal Proofs”]. I expect formal proofs of security in particular to be largely a corollary of this—a C program that is proven to correctly accomplish any particular goal will necessarily not have any buffer overflows in it, for this would invoke undefined behavior which would make your proof not go through. This does not necessarily apply to all security properties, but I would expect it to apply to most of them.