Apropos of nothing, I’m reminded of the “<antthinking>” tags originally observed in Sonnet 3.5′s system prompt, and this section of Dario’s recent essay (bolding mine):
In 2024, the idea of using reinforcement learning (RL) to train models to generate chains of thought has become a new focus of scaling. Anthropic, DeepSeek, and many other companies (perhaps most notably OpenAI who released their o1-preview model in September) have found that this training greatly increases performance on certain select, objectively measurable tasks like math, coding competitions, and on reasoning that resembles these tasks.
Apropos of nothing, I’m reminded of the “<antthinking>” tags originally observed in Sonnet 3.5′s system prompt, and this section of Dario’s recent essay (bolding mine):