I agree that “safety in an open world cannot be proved,” at least as a general claim, but disagree that this impinges on the narrow challenge of designing cars that do not cause accidents—a misunderstanding which I tried to be clear about, but which I evidently failed to make sufficiently clear, as Oliver’s misunderstanding illustrates. That said, I strongly agree that better methods for representing grain of truth problems, and considering hypotheses outside those which are in the model is critical. It’s a key reason I’m supporting work on infra-Bayesian approaches, which are designed explicitly to handle this class of problem. Again, it’s not necessary for the very narrow challenge I think I addressed above, but I certainly agree that it’s critical.
Second, I’m a huge proponent of complex system engineering approaches, and have discussed this in previous unrelated work. I certainly agree that these issues are critical, and should receive more attention—but I think it’s counterproductive to try to embed difficult problems inside of addressable ones. To offer an analogy, creating provably safe code that isn’t vulnerable to any known technical exploit still will not prevent social engineering attacks, but we can still accomplish the narrow goal.
If, instead of writing code that can’t be fuzzed for vulnerabilities, doesn’t contain buffer overflow or null-pointer vulnerabilities, and can’t be exploited via transient execution CPU vulnerabilities, and isn’t vulnerable to rowhammer attacks, you say that we need to address social engineering before trying to make the code provably safe, and should address social engineering with provable properties, you’re sabotaging progress in a tractable area in order to apply a paradigm ill-suited to the new problem you’re concerned with.
That’s why, in this piece, I started by saying I wasn’t proving anything general, and “I am making far narrower claims than the general ones which have been debated.” I agree that the larger points are critical. But for now, I wanted to make a simpler point.
I enthusiastically support doing everything we can to reduce vulnerabilities. As I indicated, a foundation model approach to representation learning would greatly improve the ability of systems to detect novelty. Perhaps we should rename the “provable safety” area as “provable safety modulo assumptions” area and be very explicit about our assumptions. We can then measure progress by the extent to which we can shrink those assumptions.
rename the “provable safety” area as “provable safety modulo assumptions” area and be very explicit about our assumptions.
Very much agree. I gave some feedback along those lines as the term was coined; and am sad it didn’t catch on. But of course “provable safety modulo assumptions” isn’t very short and catchy...
I do like the word “guarantee” as a substitute. We can talk of formal guarantees, but also of a store guaranteeing that an item you buy will meet a certain standard. So it’s connotations are nicely in the direction of proof but without, as it were, “proving too much” :)
I agree that “safety in an open world cannot be proved,” at least as a general claim, but disagree that this impinges on the narrow challenge of designing cars that do not cause accidents—a misunderstanding which I tried to be clear about, but which I evidently failed to make sufficiently clear, as Oliver’s misunderstanding illustrates. That said, I strongly agree that better methods for representing grain of truth problems, and considering hypotheses outside those which are in the model is critical. It’s a key reason I’m supporting work on infra-Bayesian approaches, which are designed explicitly to handle this class of problem. Again, it’s not necessary for the very narrow challenge I think I addressed above, but I certainly agree that it’s critical.
Second, I’m a huge proponent of complex system engineering approaches, and have discussed this in previous unrelated work. I certainly agree that these issues are critical, and should receive more attention—but I think it’s counterproductive to try to embed difficult problems inside of addressable ones. To offer an analogy, creating provably safe code that isn’t vulnerable to any known technical exploit still will not prevent social engineering attacks, but we can still accomplish the narrow goal.
If, instead of writing code that can’t be fuzzed for vulnerabilities, doesn’t contain buffer overflow or null-pointer vulnerabilities, and can’t be exploited via transient execution CPU vulnerabilities, and isn’t vulnerable to rowhammer attacks, you say that we need to address social engineering before trying to make the code provably safe, and should address social engineering with provable properties, you’re sabotaging progress in a tractable area in order to apply a paradigm ill-suited to the new problem you’re concerned with.
That’s why, in this piece, I started by saying I wasn’t proving anything general, and “I am making far narrower claims than the general ones which have been debated.” I agree that the larger points are critical. But for now, I wanted to make a simpler point.
I enthusiastically support doing everything we can to reduce vulnerabilities. As I indicated, a foundation model approach to representation learning would greatly improve the ability of systems to detect novelty. Perhaps we should rename the “provable safety” area as “provable safety modulo assumptions” area and be very explicit about our assumptions. We can then measure progress by the extent to which we can shrink those assumptions.
Very much agree. I gave some feedback along those lines as the term was coined; and am sad it didn’t catch on. But of course “provable safety modulo assumptions” isn’t very short and catchy...
I do like the word “guarantee” as a substitute. We can talk of formal guarantees, but also of a store guaranteeing that an item you buy will meet a certain standard. So it’s connotations are nicely in the direction of proof but without, as it were, “proving too much” :)
That seems fair!
I really like that idea, and the clarity it provides, and have renamed the post to reflect it! (Sorryr this was so slow- I’m travelling.)