mattmacdermott comments on Stephen McAleese’s Shortform

mattmacdermott 14 Sep 2024 21:50 UTC
11 points
0
Do we know that the test set isn’t in the training data?
- ryan_greenblatt 14 Sep 2024 22:27 UTC
  14 points
  3
  Parent
  I don’t know if we can be confident in the exact 95% result, but it is the case that o1 consistently performs at a roughly similar level on math across a variety of different benchmarks (e.g., AIME and other people have found strong performance on other math tasks which are unlikely to have been in the training corpus).