Zach Stein-Perlman comments on Model evals for dangerous capabilities

Zach Stein-Perlman 15 Oct 2024 18:30 UTC
2 points
0
Correction:
On whether Anthropic uses chain-of-thought, I said “Yes, but not explicit, but implied by discussion of elicitation in the RSP and RSP evals report.” This impression was also based on conversations with relevant Anthropic staff members. Today, Anthropic says “Some of our evaluations lacked some basic elicitation techniques such as best-of-N or chain-of-thought prompting.” Anthropic also says it believes the elicitation gap is small, in tension with previous statements.