Huh. This is roughly what I’d expected, but even I didn’t expect it to be so underwhelming.[1]
I weakly predict that the situation isn’t quite as bad for capabilities as this makes it look. But I do think something-like-this is likely the case.
- ^
Of course, moving a pass@400 capability to pass@1 isn’t nothing, but it’s clearly astronomically short of a Singularity-enabling technique that RL-on-CoTs is touted as.
Intuitively, this shouldn’t matter much. They use some RL-on-CoTs method that works, and I expect its effects are not fundamentally different from optimal methods’. Thus, optimal methods might yield better quantitative results, but similar qualitative results: maybe they’d let elicit pass@800 capabilities instead of “just” pass@400, but it’d still be just pass@k elicitation for not-astronomical k.
Not strongly convinced of that, though.