Thank you for this post. You are following the standard safety engineering playbook. It is good at dealing with known hazards. However, it has traditionally been based entirely on failure probabilities of components (brakes, engines, landing gear, etc.). Extending it to handle perception is challenging. In particular, the perceptual system must accurately quantify its uncertainty so that the controller can act cautiously under uncertainty. For aleatoric uncertainty (i.e., in-distribution), this is a solved problem. But for epistemic uncertainty, there are still fundamental challenges. As you are well aware, epistemic uncertainty encompasses out-of-distribution / anomaly / novelty detection. These methods rely on having a representation that separates the unknowns from the knowns—the anomalies from the nominals. Given such separation, flexible density estimators can do the job of detecting novelty. However, I believe it is impossible to provide guarantees for such representations. The best we can do is ensure that the representations capture all known environmental variability, and for this, vision foundation models are a practical way forward. But there may be other technical advances that can strengthen the learned representations.
Another challenge goes beyond perception to prediction. You discussed predicting the behavior of other cars, but what about pedestrians, dogs, cats, toddlers, deer, kangaroos, etc.? New modes of personal transportation are marketed all the time, and the future trajectories of people using those modes are highly variable. This is another place where epistemic uncertainty must be quantified.
Bottom line: Safety in an open world cannot be proved, because it relies on perfect epistemic uncertainty quantification, which is impossible.
Finally, I highly recommend Nancy Leveson’s book, Engineering a Safer World. She approaches safety as a socio-technical control problem. It is not sufficient to design and prove that a system is safe. Rather, safety is a property that must be actively maintained against disturbances. This requires post-deployment monitoring, detection of anomalies and near misses, diagnosis of the causes of those anomalies and near missses, and updates to the system design to accommodate them. One form of disturbance is novelty, as described above. But the biggest threat is often budget cuts, staff layoffs, and staff turnover in the organization responsible for post-deployment maintenance. She documents a recurring phenomenon—that complex systems tend to migrate toward increased risk because of these disturbances. A challenge for all of us is to figure out how to maintain robust socio-technical systems that in turn can maintain the safety of deployed systems.
I agree that “safety in an open world cannot be proved,” at least as a general claim, but disagree that this impinges on the narrow challenge of designing cars that do not cause accidents—a misunderstanding which I tried to be clear about, but which I evidently failed to make sufficiently clear, as Oliver’s misunderstanding illustrates. That said, I strongly agree that better methods for representing grain of truth problems, and considering hypotheses outside those which are in the model is critical. It’s a key reason I’m supporting work on infra-Bayesian approaches, which are designed explicitly to handle this class of problem. Again, it’s not necessary for the very narrow challenge I think I addressed above, but I certainly agree that it’s critical.
Second, I’m a huge proponent of complex system engineering approaches, and have discussed this in previous unrelated work. I certainly agree that these issues are critical, and should receive more attention—but I think it’s counterproductive to try to embed difficult problems inside of addressable ones. To offer an analogy, creating provably safe code that isn’t vulnerable to any known technical exploit still will not prevent social engineering attacks, but we can still accomplish the narrow goal.
If, instead of writing code that can’t be fuzzed for vulnerabilities, doesn’t contain buffer overflow or null-pointer vulnerabilities, and can’t be exploited via transient execution CPU vulnerabilities, and isn’t vulnerable to rowhammer attacks, you say that we need to address social engineering before trying to make the code provably safe, and should address social engineering with provable properties, you’re sabotaging progress in a tractable area in order to apply a paradigm ill-suited to the new problem you’re concerned with.
That’s why, in this piece, I started by saying I wasn’t proving anything general, and “I am making far narrower claims than the general ones which have been debated.” I agree that the larger points are critical. But for now, I wanted to make a simpler point.
I enthusiastically support doing everything we can to reduce vulnerabilities. As I indicated, a foundation model approach to representation learning would greatly improve the ability of systems to detect novelty. Perhaps we should rename the “provable safety” area as “provable safety modulo assumptions” area and be very explicit about our assumptions. We can then measure progress by the extent to which we can shrink those assumptions.
rename the “provable safety” area as “provable safety modulo assumptions” area and be very explicit about our assumptions.
Very much agree. I gave some feedback along those lines as the term was coined; and am sad it didn’t catch on. But of course “provable safety modulo assumptions” isn’t very short and catchy...
I do like the word “guarantee” as a substitute. We can talk of formal guarantees, but also of a store guaranteeing that an item you buy will meet a certain standard. So it’s connotations are nicely in the direction of proof but without, as it were, “proving too much” :)
Thank you for this post. You are following the standard safety engineering playbook. It is good at dealing with known hazards. However, it has traditionally been based entirely on failure probabilities of components (brakes, engines, landing gear, etc.). Extending it to handle perception is challenging. In particular, the perceptual system must accurately quantify its uncertainty so that the controller can act cautiously under uncertainty. For aleatoric uncertainty (i.e., in-distribution), this is a solved problem. But for epistemic uncertainty, there are still fundamental challenges. As you are well aware, epistemic uncertainty encompasses out-of-distribution / anomaly / novelty detection. These methods rely on having a representation that separates the unknowns from the knowns—the anomalies from the nominals. Given such separation, flexible density estimators can do the job of detecting novelty. However, I believe it is impossible to provide guarantees for such representations. The best we can do is ensure that the representations capture all known environmental variability, and for this, vision foundation models are a practical way forward. But there may be other technical advances that can strengthen the learned representations.
Another challenge goes beyond perception to prediction. You discussed predicting the behavior of other cars, but what about pedestrians, dogs, cats, toddlers, deer, kangaroos, etc.? New modes of personal transportation are marketed all the time, and the future trajectories of people using those modes are highly variable. This is another place where epistemic uncertainty must be quantified.
Bottom line: Safety in an open world cannot be proved, because it relies on perfect epistemic uncertainty quantification, which is impossible.
Finally, I highly recommend Nancy Leveson’s book, Engineering a Safer World. She approaches safety as a socio-technical control problem. It is not sufficient to design and prove that a system is safe. Rather, safety is a property that must be actively maintained against disturbances. This requires post-deployment monitoring, detection of anomalies and near misses, diagnosis of the causes of those anomalies and near missses, and updates to the system design to accommodate them. One form of disturbance is novelty, as described above. But the biggest threat is often budget cuts, staff layoffs, and staff turnover in the organization responsible for post-deployment maintenance. She documents a recurring phenomenon—that complex systems tend to migrate toward increased risk because of these disturbances. A challenge for all of us is to figure out how to maintain robust socio-technical systems that in turn can maintain the safety of deployed systems.
--Tom Dietterich
I agree that “safety in an open world cannot be proved,” at least as a general claim, but disagree that this impinges on the narrow challenge of designing cars that do not cause accidents—a misunderstanding which I tried to be clear about, but which I evidently failed to make sufficiently clear, as Oliver’s misunderstanding illustrates. That said, I strongly agree that better methods for representing grain of truth problems, and considering hypotheses outside those which are in the model is critical. It’s a key reason I’m supporting work on infra-Bayesian approaches, which are designed explicitly to handle this class of problem. Again, it’s not necessary for the very narrow challenge I think I addressed above, but I certainly agree that it’s critical.
Second, I’m a huge proponent of complex system engineering approaches, and have discussed this in previous unrelated work. I certainly agree that these issues are critical, and should receive more attention—but I think it’s counterproductive to try to embed difficult problems inside of addressable ones. To offer an analogy, creating provably safe code that isn’t vulnerable to any known technical exploit still will not prevent social engineering attacks, but we can still accomplish the narrow goal.
If, instead of writing code that can’t be fuzzed for vulnerabilities, doesn’t contain buffer overflow or null-pointer vulnerabilities, and can’t be exploited via transient execution CPU vulnerabilities, and isn’t vulnerable to rowhammer attacks, you say that we need to address social engineering before trying to make the code provably safe, and should address social engineering with provable properties, you’re sabotaging progress in a tractable area in order to apply a paradigm ill-suited to the new problem you’re concerned with.
That’s why, in this piece, I started by saying I wasn’t proving anything general, and “I am making far narrower claims than the general ones which have been debated.” I agree that the larger points are critical. But for now, I wanted to make a simpler point.
I enthusiastically support doing everything we can to reduce vulnerabilities. As I indicated, a foundation model approach to representation learning would greatly improve the ability of systems to detect novelty. Perhaps we should rename the “provable safety” area as “provable safety modulo assumptions” area and be very explicit about our assumptions. We can then measure progress by the extent to which we can shrink those assumptions.
Very much agree. I gave some feedback along those lines as the term was coined; and am sad it didn’t catch on. But of course “provable safety modulo assumptions” isn’t very short and catchy...
I do like the word “guarantee” as a substitute. We can talk of formal guarantees, but also of a store guaranteeing that an item you buy will meet a certain standard. So it’s connotations are nicely in the direction of proof but without, as it were, “proving too much” :)
That seems fair!
I really like that idea, and the clarity it provides, and have renamed the post to reflect it! (Sorryr this was so slow- I’m travelling.)