Quinn comments on Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof

Quinn 16 Feb 2023 20:15 UTC
1 point
0
Agree. The product stack has non-learned components in it. Let’s throw the kitchen sink at those components.

Some techniques that go under the banner of formal verification are not applied type theory (deduction), usually in the model checking family, or maybe with abstract interpretation. For the record I only have my applied type theory experience, I’m a coq dev, I’ve spent a cumulative like two hours over the past 3 years pondering abstract interpretation or model checking. That’s why I said “formal proof” in the title instead of “formal verification”. I could’ve called it “proof engineering” or “applied type theory” maybe, in the title, but I didn’t.

In order for there to be alpha in applied type theory for the learned components, a great deal of things need to fall into place mostly in mechinterp, then some moderate applied type theory work, then a whole lot of developer experience / developer ecosystems work. That’s what I believe.
- Gerald Monroe 16 Feb 2023 20:26 UTC
  2 points
  0
  Parent
  Ok. So you’re working from the perspective of “can we make a model interpretable and to formally prove properties”.
  
  I am working from the perspective of “obviously you can’t make a competitive nn model with billions of weights provable, but if you optimize it certain ways you could make an overall machine with a policy trapped in a local minima.”
  
  Then, with knowledge of what sort of inputs produce stable and reliable outputs, measured empirically in simulation, let’s auto shut down the machine when it leaves the “safe” states.
  
  And let’s stack probabilities in our favor. If we know blackbox model A fails about 0.01 percent of the time on inputs from our test distribution, let’s train model B such that it often succeeds where A fails. (empirically verifiable)
  
  The overall framework—with an A and a B model, and a whole ecosystem so the overall machine doesn’t segfault in a driver and kill people because nothing is in control of the hardware—is where you need formally proven code.
  
  This kind of “down and dirty ” engineering based on measurements is basically how we do it in most production software roles. It’s not useful to rely on theory alone, for the reasons you mention.