dmz comments on Anthropic rewrote its RSP

dmz 16 Oct 2024 17:29 UTC
14 points
4
(I work on the Alignment Stress-Testing team at Anthropic and have been involved in the RSP update and implementation process.)
Re not believing Anthropic’s statement:
we believe the risk of substantial under-elicitation is low
To be more precise: there was significant under-elicitation but the distance to the thresholds was large enough that the risk of crossing them even with better elicitation was low.