A model/ensemble of models will achieve >90% on the MATH dataset using a no-calculator rule
Curious to hear if/how you would update your credence in this being achieved by 2026 or 2030 after seeing the 50%+ accuracy from Google’s Minerva. Your prediction seemed reasonable to me at the time, and this rapid progress seems like a piece of evidence favoring shorter timelines.
I’ve updated significantly. However, unfortunately, I have not yet seen how well the model performs on the hardest difficulty problems on the MATH dataset, which could give me a much better picture of how impressive I think this result is.
Curious to hear if/how you would update your credence in this being achieved by 2026 or 2030 after seeing the 50%+ accuracy from Google’s Minerva. Your prediction seemed reasonable to me at the time, and this rapid progress seems like a piece of evidence favoring shorter timelines.
I’ve updated significantly. However, unfortunately, I have not yet seen how well the model performs on the hardest difficulty problems on the MATH dataset, which could give me a much better picture of how impressive I think this result is.
I’m pretty sure I will “win” my bet against him; even two months is a lot of time in AI these days.