On whether Anthropic uses chain-of-thought, I said “Yes, but not explicit, but implied by discussion of elicitation in the RSP and RSP evals report.” This impression was also based on conversations with relevant Anthropic staff members. Today, Anthropic says “Some of our evaluations lacked some basic elicitation techniques such as best-of-N or chain-of-thought prompting.” Anthropic also says it believes the elicitation gap is small, in tension with previous statements.
Correction:
On whether Anthropic uses chain-of-thought, I said “Yes, but not explicit, but implied by discussion of elicitation in the RSP and RSP evals report.” This impression was also based on conversations with relevant Anthropic staff members. Today, Anthropic says “Some of our evaluations lacked some basic elicitation techniques such as best-of-N or chain-of-thought prompting.” Anthropic also says it believes the elicitation gap is small, in tension with previous statements.