David, I really like that your description discusses the multiple interacting levels involved in self-driving cars (eg. software, hardware, road rules, laws, etc.). Actual safety requires reasoning about those interacting levels and ensuring that the system as a whole doesn’t have vulnerabilities. Malicious humans and AIs are likely to try to exploit systems in unfamiliar ways. For example, here are 10 potential harmful actions (out of many many more possibilities!) that AI-enabled malicious humans or AIs could gain benefits from which involve controlling self-driving cars in harmful ways:
1) Murder for hire: Cause accidents that kill a car’s occupant, another car’s occupant, or pedestrians
2) Terror for hire: Cause accidents that injure groups of pedestrians
3) Terror for hire: Control vehicles to deliver explosives or disperse pathogens
4) Extortion: Demand payment to not destroy a car, harm the driver, or harm others
5) Steal delivery contents: Take over delivery vehicles to steal their contents
6) Steal car components: Take over cars to steal components like catalytic converters
7) Surreptitious delivery or taxis: Control vehicles to earn money for illegal deliveries or use in illegal activities
8) Insurance fraud for hire: Cause damage to property for insurance fraud
9) Societal distraction for hire: Cause multiple crashes to overwhelm the societal response
10) Blocked roads for hire: Control multiple vehicles to block roadways for social harm or extortion
To prevent these and many related harms, safety analyses must integrate analyses on multiple levels. We need formal models of the relevant dynamics of each level and proof of adherence to overall safety criteria. For societal trust in the actual safety of a self-driving vehicle, I believe manufacturers will need to provide machine-checkable “certificates” of these analyses.
I think these are all really great things that we could formalize and build guarantees around. I think some of them are already ruled out by the responsibility sensitive safety guarantees, but others certainly are not. On the other hand, I don’t think that use of cars to do things that violate laws completely unrelated to vehicle behavior are in scope; similar to what I mentioned to Oliver, if what is needed in order for a system to be safe is that nothing bad can be done, you’re heading in the direction of a claim that the only safe AI is a universal dictator that has sufficient power to control all outcomes.
But in cases where provable safety guarantees are in place, and the issues relate to car behavior—such as cars causing damage, blocking roads, or being redirected away from the intended destination—I think hardware guarantees on the system, combined with software guarantees, combined with verifying that only trusted code is being run, could be used to ignition-lock cars which have been subverted.
And I think that in the remainder of cases, where cars are being used for dangerous or illegal purposes, we need to trade off freedom and safety. I certainly don’t want AI systems which can conspire to break the law—and in most cases, I expect that this is something LLMs can already detect—but I also don’t want a car which will not run if it determines that a passenger is guilty of some unrelated crime like theft. But for things like “deliver explosives or disperse pathogens,” I think vehicle safety is the wrong path to preventing dangerous behavior; it seems far more reasonable to have separate systems that detect terrorism, and separate types of guarantees to ensure LLMs don’t enable that type of behavior.
David, I really like that your description discusses the multiple interacting levels involved in self-driving cars (eg. software, hardware, road rules, laws, etc.). Actual safety requires reasoning about those interacting levels and ensuring that the system as a whole doesn’t have vulnerabilities. Malicious humans and AIs are likely to try to exploit systems in unfamiliar ways. For example, here are 10 potential harmful actions (out of many many more possibilities!) that AI-enabled malicious humans or AIs could gain benefits from which involve controlling self-driving cars in harmful ways:
1) Murder for hire: Cause accidents that kill a car’s occupant, another car’s occupant, or pedestrians
2) Terror for hire: Cause accidents that injure groups of pedestrians
3) Terror for hire: Control vehicles to deliver explosives or disperse pathogens
4) Extortion: Demand payment to not destroy a car, harm the driver, or harm others
5) Steal delivery contents: Take over delivery vehicles to steal their contents
6) Steal car components: Take over cars to steal components like catalytic converters
7) Surreptitious delivery or taxis: Control vehicles to earn money for illegal deliveries or use in illegal activities
8) Insurance fraud for hire: Cause damage to property for insurance fraud
9) Societal distraction for hire: Cause multiple crashes to overwhelm the societal response
10) Blocked roads for hire: Control multiple vehicles to block roadways for social harm or extortion
To prevent these and many related harms, safety analyses must integrate analyses on multiple levels. We need formal models of the relevant dynamics of each level and proof of adherence to overall safety criteria. For societal trust in the actual safety of a self-driving vehicle, I believe manufacturers will need to provide machine-checkable “certificates” of these analyses.
I think these are all really great things that we could formalize and build guarantees around. I think some of them are already ruled out by the responsibility sensitive safety guarantees, but others certainly are not. On the other hand, I don’t think that use of cars to do things that violate laws completely unrelated to vehicle behavior are in scope; similar to what I mentioned to Oliver, if what is needed in order for a system to be safe is that nothing bad can be done, you’re heading in the direction of a claim that the only safe AI is a universal dictator that has sufficient power to control all outcomes.
But in cases where provable safety guarantees are in place, and the issues relate to car behavior—such as cars causing damage, blocking roads, or being redirected away from the intended destination—I think hardware guarantees on the system, combined with software guarantees, combined with verifying that only trusted code is being run, could be used to ignition-lock cars which have been subverted.
And I think that in the remainder of cases, where cars are being used for dangerous or illegal purposes, we need to trade off freedom and safety. I certainly don’t want AI systems which can conspire to break the law—and in most cases, I expect that this is something LLMs can already detect—but I also don’t want a car which will not run if it determines that a passenger is guilty of some unrelated crime like theft. But for things like “deliver explosives or disperse pathogens,” I think vehicle safety is the wrong path to preventing dangerous behavior; it seems far more reasonable to have separate systems that detect terrorism, and separate types of guarantees to ensure LLMs don’t enable that type of behavior.