IMO it’s unlikely that we’re ever going to have a safety case that’s as reliable as the nuclear physics calculations that showed that the Trinity Test was unlikely to ignite the atmosphere (where my impression is that the risk was mostly dominated by risk of getting the calculations wrong). If we have something that is less reliable, then will we ever be in a position where only considering the safety case gives a low enough probability of disaster for launching an AI system beyond the frontier where disastrous capabilities are demonstrated? Thus, in practice, decisions will probably not be made on a safety case alone, but also based on some positive case of the benefits of deployment (e.g. estimated reduced x-risk, advancing the “good guys” in the race, CEO has positive vibes that enough risk mitigation has been done, etc.). It’s not clear what role governments should have in assessing this, maybe we can only get assessment of the safety case, but it’s useful to note that safety cases won’t be the only thing informs these decisions.
This situation is pretty disturbing, and I wish we had a better way, but it still seems useful to push the positive benefit case more towards “careful argument about reduced x-risk” and away from “CEO vibes about whether enough mitigation has been done”.
Based on the current vibes, I think that suggest that methodological errors alone will lead to significant chance of significant error for any safety case in AI.
I agree that “CEO vibes about whether enough mitigation has been done” seems pretty unacceptable.
I agree that in practice, labs probably will go forward with deployments that have >5% probability of disaster; I think it’s pretty plausible that the lab will be under extreme external pressure (e.g. some other country is building a robot army that’s a month away from being ready to invade) that causes me to agree with the lab’s choice to do such an objectively risky deployment.
Would be nice if it was based on “actual robot army was actually being built and you have multiple confirmatory sources and you’ve tried diplomacy and sabotage and they’ve both failed” instead of “my napkin math says they could totally build a robot army bro trust me bro” or “they totally have WMDs bro” or “we gotta blow up some Japanese civilians so that we don’t have to kill more Japanese civilians when we invade Japan bro” or “dude I’m seeing some missiles on our radar, gotta launch ours now bro”.
IMO it’s unlikely that we’re ever going to have a safety case that’s as reliable as the nuclear physics calculations that showed that the Trinity Test was unlikely to ignite the atmosphere (where my impression is that the risk was mostly dominated by risk of getting the calculations wrong). If we have something that is less reliable, then will we ever be in a position where only considering the safety case gives a low enough probability of disaster for launching an AI system beyond the frontier where disastrous capabilities are demonstrated?
Thus, in practice, decisions will probably not be made on a safety case alone, but also based on some positive case of the benefits of deployment (e.g. estimated reduced x-risk, advancing the “good guys” in the race, CEO has positive vibes that enough risk mitigation has been done, etc.). It’s not clear what role governments should have in assessing this, maybe we can only get assessment of the safety case, but it’s useful to note that safety cases won’t be the only thing informs these decisions.
This situation is pretty disturbing, and I wish we had a better way, but it still seems useful to push the positive benefit case more towards “careful argument about reduced x-risk” and away from “CEO vibes about whether enough mitigation has been done”.
Relevant paper discussing risks of risk assessments being wrong due to theory/model/calculation error. Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes
Based on the current vibes, I think that suggest that methodological errors alone will lead to significant chance of significant error for any safety case in AI.
I agree that “CEO vibes about whether enough mitigation has been done” seems pretty unacceptable.
I agree that in practice, labs probably will go forward with deployments that have >5% probability of disaster; I think it’s pretty plausible that the lab will be under extreme external pressure (e.g. some other country is building a robot army that’s a month away from being ready to invade) that causes me to agree with the lab’s choice to do such an objectively risky deployment.
Would be nice if it was based on “actual robot army was actually being built and you have multiple confirmatory sources and you’ve tried diplomacy and sabotage and they’ve both failed” instead of “my napkin math says they could totally build a robot army bro trust me bro” or “they totally have WMDs bro” or “we gotta blow up some Japanese civilians so that we don’t have to kill more Japanese civilians when we invade Japan bro” or “dude I’m seeing some missiles on our radar, gotta launch ours now bro”.