ryan_greenblatt comments on Sabotage Evaluations for Frontier Models

ryan_greenblatt 22 Oct 2024 17:11 UTC
3 points
0
Our evaluations indicate that we have very poor levels of safety (e.g. the AI would probably be able to escape if it wanted to) and we can’t find countermeasures which suffice to ensure any level of safety (without basically giving up on using this model).