Any mention of what the “mitigations” in question would be?
Not really. For now, OpenAI mostly mentions restricting deployment (this section is pretty disappointing):
A central part of meeting our safety baselines is implementing mitigations to address various types of model risk. Our mitigation strategy will involve both containment measures, which help reduce risks related to possession of a frontier model, as well as deployment mitigations, which help reduce risks from active use of a frontier model. As a result, these mitigations might span increasing compartmentalization, restricting deployment to trusted users, implementing refusals, redacting training data, or alerting distribution partners.
I predict that if you read the doc carefully, you’d say “probably net-harmful relative to just not pretending to have a safety plan in the first place.”
Asset protection (e.g., restricting access to models to a limited nameset of people, general infosec)
Restricting deployment (only models with a risk score of “medium” or below can be deployed)
Restricting development (models with a risk score of “critical” cannot be developed further until safety techniques have been applied that get it down to “high.” Although they kind of get to decide when they think their safety techniques have worked sufficiently well.)
My one-sentence reaction after reading the doc for the first time is something like “it doesn’t really tell us how OpenAI plans to address the misalignment risks that many of us are concerned about, but with that in mind, it’s actually a fairly reasonable document with some fairly concrete commitments”).
Not really. For now, OpenAI mostly mentions restricting deployment (this section is pretty disappointing):
I predict that if you read the doc carefully, you’d say “probably net-harmful relative to just not pretending to have a safety plan in the first place.”
They mention three types of mitigations:
Asset protection (e.g., restricting access to models to a limited nameset of people, general infosec)
Restricting deployment (only models with a risk score of “medium” or below can be deployed)
Restricting development (models with a risk score of “critical” cannot be developed further until safety techniques have been applied that get it down to “high.” Although they kind of get to decide when they think their safety techniques have worked sufficiently well.)
My one-sentence reaction after reading the doc for the first time is something like “it doesn’t really tell us how OpenAI plans to address the misalignment risks that many of us are concerned about, but with that in mind, it’s actually a fairly reasonable document with some fairly concrete commitments”).