Just chiming in that I appreciate this post, and my independent impressions of reading the FSF align with Zach’s conclusions: weak and unambitious.
A couple additional notes:
The thresholds feel high — 6⁄7 of the CCLs feel like the capabilities would be a Really Big Deal in prosaic terms, and ~4 feel like a big deal for x-risk. But you can’t say whether the thresholds are “too high” without corresponding safety mitigations, which this document doesn’t have. (Zach)
These also seemed pretty high to me, which is concerning given that they are “Level 1”. This doesn’t necessarily imply but it does hint that there won’t be substantial mitigations — above the current level — required until those capability levels. My guess is that current jailbreak prevention is insufficient to mitigate substantial risk from models that are a little under the level 1 capabilities for e.g., bio.
GDP gets props for specifically indicating ML R&D + “hyperbolic growth in AI capabilities” as a source of risk.
Given the lack of commitments, it’s also somewhat unclear what scope to expect this framework to eventually apply to. GDM is a large org with, presumably, multiple significant general AI capabilities projects. Especially given that “deployment” refers to external deployment, it seems like there’s going to be substantial work to ensuring that all the internal AI development projects proceed safely. e.g., when/if there are ≥3 major teams and dozens of research projects working on fine-tuning highly capable models (e.g., base model just below level 1), compliance may be quite difficult. But this all depends on what the actual commitments and mechanisms turn out to be. This comes to mind after this event a few weeks ago, where it looks like a team at Microsoft released a model without following all internal guidelines, and then tried to unrelease it (but I could be confused).
Just chiming in that I appreciate this post, and my independent impressions of reading the FSF align with Zach’s conclusions: weak and unambitious.
A couple additional notes:
These also seemed pretty high to me, which is concerning given that they are “Level 1”. This doesn’t necessarily imply but it does hint that there won’t be substantial mitigations — above the current level — required until those capability levels. My guess is that current jailbreak prevention is insufficient to mitigate substantial risk from models that are a little under the level 1 capabilities for e.g., bio.
GDP gets props for specifically indicating ML R&D + “hyperbolic growth in AI capabilities” as a source of risk.
Given the lack of commitments, it’s also somewhat unclear what scope to expect this framework to eventually apply to. GDM is a large org with, presumably, multiple significant general AI capabilities projects. Especially given that “deployment” refers to external deployment, it seems like there’s going to be substantial work to ensuring that all the internal AI development projects proceed safely. e.g., when/if there are ≥3 major teams and dozens of research projects working on fine-tuning highly capable models (e.g., base model just below level 1), compliance may be quite difficult. But this all depends on what the actual commitments and mechanisms turn out to be. This comes to mind after this event a few weeks ago, where it looks like a team at Microsoft released a model without following all internal guidelines, and then tried to unrelease it (but I could be confused).