It’s so much better if everyone in the company can walk around and tell you what are the top goals of the RSP, how do we know if we’re meeting them, what AI safety level are we at right now—are we at ASL-2, are we at ASL-3—that people know what to look for because that is how you’re going to have good common knowledge of if something’s going wrong.
I like this goal a lot: Good RSPs could contribute to building common language/awareness around several topics (e.g., “if” conditions, “then” commitments, how safety decisions will be handled). As many have pointed out, though, I worry that current RSPs haven’t been concrete or clear enough to build this kind of understanding/awareness.
One interesting idea would be to survey company employees and evaluate their understanding of RSPs & the extent to which RSPs are having an impact on internal safety culture. Example questions/topics:
What is the high-level purpose of the RSP?
Does the RSP specify “if” triggers (specific thresholds that, if hit, could cause the company to stop scaling or deployment activities)? If so, what are they?
Does the RSP specify “then” commitments (specific actions that must be taken in order to cause the company to continue scaling or deployment activities). If so, what are they?
Does the RSP specify how decisions about risk management will be made? If so, how will they made & who are the key players involved?
Are there any ways in which the RSP has affected your work at Anthropic? If so, how?
One of my concerns about RSPs is that they (at least in their current form) don’t actually achieve the goal of building common knowledge/awareness or improving company culture. I suspect surveys like this could prove me wrong– and more importantly, provide scaling companies with useful information about the extent to which their scaling policies are understood by employees, help foster common understanding, etc.
(Another version of this could involve giving multiple RSPs to a third-party– like an AI Safety Institute– and having them answer similar questions. This could provide another useful datapoint RE the extent to which RSPs are clearly/concretely laying out a set of specific or meaningful contributions.)
I like this goal a lot: Good RSPs could contribute to building common language/awareness around several topics (e.g., “if” conditions, “then” commitments, how safety decisions will be handled). As many have pointed out, though, I worry that current RSPs haven’t been concrete or clear enough to build this kind of understanding/awareness.
One interesting idea would be to survey company employees and evaluate their understanding of RSPs & the extent to which RSPs are having an impact on internal safety culture. Example questions/topics:
What is the high-level purpose of the RSP?
Does the RSP specify “if” triggers (specific thresholds that, if hit, could cause the company to stop scaling or deployment activities)? If so, what are they?
Does the RSP specify “then” commitments (specific actions that must be taken in order to cause the company to continue scaling or deployment activities). If so, what are they?
Does the RSP specify how decisions about risk management will be made? If so, how will they made & who are the key players involved?
Are there any ways in which the RSP has affected your work at Anthropic? If so, how?
One of my concerns about RSPs is that they (at least in their current form) don’t actually achieve the goal of building common knowledge/awareness or improving company culture. I suspect surveys like this could prove me wrong– and more importantly, provide scaling companies with useful information about the extent to which their scaling policies are understood by employees, help foster common understanding, etc.
(Another version of this could involve giving multiple RSPs to a third-party– like an AI Safety Institute– and having them answer similar questions. This could provide another useful datapoint RE the extent to which RSPs are clearly/concretely laying out a set of specific or meaningful contributions.)