Marius Hobbhahn comments on Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous)

Marius Hobbhahn 25 Apr 2023 19:11 UTC
11 points
8
I agree with the overall conclusion that the burden of proof should be on the side of the AGI companies.

However, using the FDA as a reference or example might not be so great because it has historically gotten the cost-benefit trade-offs wrong many times and e.g. not permitting medicine that was comparatively safe and highly effective.

So if the association of AIS evals or is similar to the FDA, we might not make too many friends. Overall, I think it would be fine if the AIS auditing community is seen as generally cautious but it should not give the impression of not updating on relevant evidence, etc.

If I were to choose a model or reference class for AI auditing, I would probably choose the aviation industry which seems to be pretty competent and well-regarded.
- lisas 9 May 2023 1:19 UTC
  3 points
  0
  Parent
  That seems like an excellent angle to the issue—I agree that reference models and stakeholders’ different attitudes towards them likely have a huge impact. As such, the criticisms the FDA faces might indeed be an issue! (at least that’s how I understand your comment);
  However, I’d carefully offer a bit of pushback on the aviation industry as an example, keeping in mind the difficult tradeoffs and diverging interests regulators will face in designing an approval process for AI systems. I think the problems that regulators will face are more similar to those of the FDA & policymakers (if you assume they are your audience) might be more comfortable with a model that can somewhat withstand these problems.
  
  Below my reasoning (with a bit of an overstatement/ political rhetoric e.g., “risking peoples live”)
  As you highlighted, FDA is facing substantial criticism for being too cautious, e.g., with the Covid Vaccine taking longer to approve than the UK. Not permitting a medicine that would have been comparatively safe and highly effective, i.e., a false negative, can mean that medicine could have had a profound positive impact on someone’s life. And beyond the public interest, industry has quite some financial interests in getting these through too. In a similar vein, I expect that regulators will face quite some pushback when “slowing” innovation down, i.e. not approving a model. On the other side, being too fast in pushing drugs through the pipeline is also commonly criticized (e.g., the recent Alzheimer’s drug approval as a false positive example). Even more so, losing its reputation as a trustworthy regulator has a lot of knock-on effects. (i.e., will people trust an FDA-approved vaccine in the future?). As such, both being too cautious and being too aggressive have both potentially high costs to people’s lives, striking the right balance is incredibly difficult.
  The aviation industry also faces a tradeoff, but I would argue, one side is inherently “weaker” than the other (for lack of a better description). In case something bad happens, there are huge reputational costs to the regulator if they had invested “too little” into safety. A false negative error, however, i.e., overestimating the level of caution required and demanding more safety than necessary, does not necessarily negatively impact the reputation of the regulator; there are more or less only economic costs. And most people seem to be okay with high safety standards in aviation. In other words & simplified, “overinvesting” in safety comes at an economic cost, and “underinvesting” in safety comes at reputational costs to the regulator and potentially people’s live.
  My guess is that the reputational risks (& competing goals) that AI regulators will face, in particular in regards to the false negatives, are similar to those of the FDA. They will either be too cautious/ interventionist /innovation-hampering or too aggressive, if not both. Aviation safety is (in my perception) rarely seen as too cautious (or at least nothing that get’s routinely criticised by the public).
  Policy-makers—especially those currently “battling big tech”—are quite well aware of these tradeoffs they will face and the breath of stakeholders involved. As such, using an example that can withstand the reputational costs of applying too much caution might be a bit more powerful in some cases. In a similar vein, the FDA model is much more probed regarding capture (not getting one drug approved is incredibly costly for a single firm not for the whole industry, while industry-wide costs from safety restrictions in aviation can be passed on to consumers).
  Nonetheless, I completely understand the concern that “we might not make too many friends,”, particularly among those focused on typical “pro-innovation considerations” or industry interests and that it makes sense to use this example with some caution.
- Akash 25 Apr 2023 19:16 UTC
  3 points
  1
  Parent
  This makes sense. Can you say more about how aviation regulation differs from the FDA?
  In other words, are there meaningful differences in how the regulatory processes are set up? Or does it just happen to be the case that the FDA has historically been worse at responding to evidence compared to the Federal Aviation Administration?
  (I think it’s plausible that we would want a structure similar to the FDA even if the particular individuals at the FDA were bad at cost-benefit analysis, unless there are arguments that the structure of the FDA caused the bad cost-benefit analyses).
  - Marius Hobbhahn 26 Apr 2023 9:18 UTC
    2 points
    0
    Parent
    So far, I haven’t looked into it in detail and I’m only reciting other people’s testimonials. I intend to dive deeper into these fields soon. I’ll let you know when I have a better understanding.