Does the MATH dataset have the worst scaling laws of all these tasks? (and math/logic tasks in general?)
Does the MATH dataset have the worst scaling laws of all these tasks? (and math/logic tasks in general?)