Brendan Long comments on Anthropic releases Claude 3.7 Sonnet with extended thinking mode

Brendan Long 24 Feb 2025 21:32 UTC
6 points
0
During our evaluations we noticed that Claude 3.7 Sonnet occasionally resorts to special-casing in order to pass test cases in agentic coding environments like Claude Code. Most often this takes the form of directly returning expected test values rather than implementing general solutions, but also includes modifying the problematic tests themselves to match the code’s output.
Claude officially passes the junior engineer Turing Test?