The length of Claude 3.7 Sonnet scratchpads can be controlled, and it was also the case for the early snapshot I used in these experiments, though all scratchpads used here had a relatively high thinking budget. I think the experiments here mostly rule out the position some people had that there would be a lot of hidden reasoning even with relatively mild length control because English is very inefficient compare to steganography.
I agree it would be interesting to run the same experiments on scratchpads with a smaller thinking budget.
I agree paraphrasing using Claude 3.5 Sonnet is not ideal, and I’d be curious to see what results would be when paraphrasing with other models. Just noting that you need to make sure the models are smart enough to agree to paraphrase long text (it was very hard to get Claude 3.5 Haiku to actually do that, which is why I gave up and focused on Claude 3.5 Sonnet paraphrases). I am not very worried: I’d be very surprised if Claude 3.5 Sonnet was able to learn/recognize and keep the encoded reasoning Claude 3.7 Sonnet uses. But I am curious how the exact paraphrasing scheme used (both the model and the prompt) influences performance.
I agree some form of obfuscation is plausible in these scenarios and Deepseek was trained against a strong CoT monitor + outcome-based feedback that would have been more interesting to study than math.
My intuition for why Math is still interesting: AI companies (e.g. Deepseek / OpenAI) will do A LOT of RL on math / programming. How likely encoded reasoning is to emerge is a function of how much RL you do on a given task, and how strong incentives for encoded reasoning are. Currently people don’t seem to be doing much optimization against CoT in big reasoning RL, but they are doing lots of RL in settings where encoded reasoning may have been incentivized because English might be inefficient (in length and expressiveness). Therefore, the big reasoning RL tasks like math are where encoded reasoning is among the most likely tasks to be observed if models learned encoded reasoning during RL.