The version of your concern that I endorse is that this framework wouldn’t work very well in worlds where warning shots are (or, more to the point, expected to be rare by the key decision-makers). It can deter large incidents, but only those that are associated with small incidents that are more likely. If the threat models you’re most worried about are unlikely to produce such near-misses, then the expectation of liability is unlikely to be a sufficient deterrent. It’s not clear to me that there are politically viable policies that would significantly mitigate those kinds of risks, but I plan to address that question more deeply in future work.
The expected prevalence of warning shots is something I really don’t have any sense of. Ideally, of course, I’d like a policy that both increases the likelihood of (doesn’t disincentivize) small, early warning shots in the context of paths that, without them, would lead to large incidents, but also disincentivizes all bad outcomes such that companies want to avoid them.
The idea with my framework is punitive damages would only be available to the extent that the most cost-effective risk mitigation measures that the AI system developer/deployer could have to taken to further reduce to likelihood and/or severity of the practically compensable harm would also tend to mitigate the uninsurable risk. I agree that there’s a potential Goodhart problem here, which the prospect of liability could give AI companies strong incentives to eliminate warning shots, without doing very much to mitigate the catastrophic risk. For this reason, I think it’s really important that the punitive damages formula put heavy weight on the elasticity of the particular practically compensable harm at issue with the associated uninsurable risk.
The version of your concern that I endorse is that this framework wouldn’t work very well in worlds where warning shots are (or, more to the point, expected to be rare by the key decision-makers). It can deter large incidents, but only those that are associated with small incidents that are more likely. If the threat models you’re most worried about are unlikely to produce such near-misses, then the expectation of liability is unlikely to be a sufficient deterrent. It’s not clear to me that there are politically viable policies that would significantly mitigate those kinds of risks, but I plan to address that question more deeply in future work.
Thanks, that makes sense.
The expected prevalence of warning shots is something I really don’t have any sense of. Ideally, of course, I’d like a policy that both increases the likelihood of (doesn’t disincentivize) small, early warning shots in the context of paths that, without them, would lead to large incidents, but also disincentivizes all bad outcomes such that companies want to avoid them.
The idea with my framework is punitive damages would only be available to the extent that the most cost-effective risk mitigation measures that the AI system developer/deployer could have to taken to further reduce to likelihood and/or severity of the practically compensable harm would also tend to mitigate the uninsurable risk. I agree that there’s a potential Goodhart problem here, which the prospect of liability could give AI companies strong incentives to eliminate warning shots, without doing very much to mitigate the catastrophic risk. For this reason, I think it’s really important that the punitive damages formula put heavy weight on the elasticity of the particular practically compensable harm at issue with the associated uninsurable risk.