ryan_greenblatt comments on Stephen McAleese’s Shortform

ryan_greenblatt 14 Sep 2024 22:27 UTC
14 points
3
I don’t know if we can be confident in the exact 95% result, but it is the case that o1 consistently performs at a roughly similar level on math across a variety of different benchmarks (e.g., AIME and other people have found strong performance on other math tasks which are unlikely to have been in the training corpus).