We just disagree. E.g. you “walked away with a much better understanding of how OpenAI plans to evaluate & handle risks than how Anthropic plans to handle & evaluate risks”; I felt like Anthropic was thinking about most stuff better.
I think Anthropic’s ASL-3 is reasonable and OpenAI’s thresholds and corresponding commitments are unreasonable. If the ASL-4 threshold was high or commitments are poor such that ASL-4 was meaningless, I agree Anthropic’s RSP would be at least as bad as OpenAI’s.
One thing I think is a big deal: Anthropic’s RSP treats internal deployment like external deployment; OpenAI’s has almost no protections for internal deployment.
I agree “an initial RSP that mostly spells out high-level reasoning, makes few hard commitments, and focuses on misuse while missing the all-important evals and safety practices for ASL-4” is also a fine characterization of Anthropic’s current RSP.
Quick edit: PF thresholds are too high; PF seems doomed / not on track. But RSPv1 is consistent with RSPv1.1 being great. At least Anthropic knows and says there’s a big hole. That’s not super relevant to evaluating labs’ current commitments but is very relevant to predicting.
I agree with ~all of your subpoints but it seems like we disagree in terms of the overall appraisal.
Thanks for explaining your overall reasoning though. Also big +1 that the internal deployment stuff is scary. I don’t think either lab has told me what protections they’re going to use for internally deploying dangerous (~ASL-4) systems, but the fact that Anthropic treats internal deployment like external deployment is a good sign. OpenAI at least acknowledges that internal deployment can be dangerous through its distinction between high risk (can be internally deployed) and critical risk (cannot be), but I agree that the thresholds are too high, particularly for model autonomy.
Sorry for brevity.
We just disagree. E.g. you “walked away with a much better understanding of how OpenAI plans to evaluate & handle risks than how Anthropic plans to handle & evaluate risks”; I felt like Anthropic was thinking about most stuff better.
I think Anthropic’s ASL-3 is reasonable and OpenAI’s thresholds and corresponding commitments are unreasonable. If the ASL-4 threshold was high or commitments are poor such that ASL-4 was meaningless, I agree Anthropic’s RSP would be at least as bad as OpenAI’s.
One thing I think is a big deal: Anthropic’s RSP treats internal deployment like external deployment; OpenAI’s has almost no protections for internal deployment.
I agree “an initial RSP that mostly spells out high-level reasoning, makes few hard commitments, and focuses on misuse while missing the all-important evals and safety practices for ASL-4” is also a fine characterization of Anthropic’s current RSP.
Quick edit: PF thresholds are too high; PF seems doomed / not on track. But RSPv1 is consistent with RSPv1.1 being great. At least Anthropic knows and says there’s a big hole. That’s not super relevant to evaluating labs’ current commitments but is very relevant to predicting.
I agree with ~all of your subpoints but it seems like we disagree in terms of the overall appraisal.
Thanks for explaining your overall reasoning though. Also big +1 that the internal deployment stuff is scary. I don’t think either lab has told me what protections they’re going to use for internally deploying dangerous (~ASL-4) systems, but the fact that Anthropic treats internal deployment like external deployment is a good sign. OpenAI at least acknowledges that internal deployment can be dangerous through its distinction between high risk (can be internally deployed) and critical risk (cannot be), but I agree that the thresholds are too high, particularly for model autonomy.