aysja comments on Zach Stein-Perlman’s Shortform

aysja 23 Jul 2024 21:46 UTC
6 points
0
I agree that scoring “medium” seems like it would imply crossing into the medium zone, although I think what they actually mean is “at most medium.” The full quote (from above) says:
In other words, if we reach (or are forecasted to reach) at least “high” pre-mitigation risk in any of the considered categories, we will not continue with deployment of that model (by the time we hit “high” pre-mitigation risk) until there are reasonably mitigations in place for the relevant postmitigation risk level to be back at most to “medium” level.
I.e., I think what they’re trying to say is that they have different categories of evals, each of which might pass different thresholds of risk. If any of those are “high,” then they’re in the “medium zone” and they can’t deploy. But if they’re all medium, then they’re in the “below medium zone” and they can. This is my current interpretation, although I agree it’s fairly confusing and it seems like they could (and should) be more clear about it.
- Zach Stein-Perlman 23 Jul 2024 21:53 UTC
  4 points
  0
  Parent
  Surely if any categories are above the “high” threshold then they’re in “high zone” and if all are below the “high” threshold then they’re in “medium zone.”
  And regardless the reading you describe here seems inconsistent with
  We won’t release a new model if it crosses a “Medium” risk threshold from our Preparedness Framework, until we implement sufficient safety interventions to bring the post-mitigation score back to “Medium”.
  [edited]
  Added later: I think someone else had a similar reading and it turned out they were reading “crosses a medium risk threshold” as “crosses a high risk threshold” and that’s just [not reasonable / too charitable].