YafahEdelman comments on FrontierMath Score of o3-mini Much Lower Than Claimed

YafahEdelman 17 Mar 2025 23:42 UTC
3 points
4
Fixed the link.

IMO that’s plausible but it would be pretty misleading since they described it as “o3-mini with high reasoning” and had “o3-mini (high)” in the chart and o3-mini high is what they call a specific option in ChatGPT.
- isabel 18 Mar 2025 4:04 UTC
  7 points
  2
  Parent
  the reason why my first thought was that they used more inference is that ARC Prize specifies that that’s how they got their ARC-AGI score (https://arcprize.org/blog/oai-o3-pub-breakthrough) - my read on this graph is that they spent $300k+ on getting their score (there’s 100 questions in the semi-private eval). o3 high, not o3-mini high, but this result is pretty strong proof of concept that they’re willing to spend a lot on inference for good scores.