Additional data points:
o1-preview and the new Claude Sonnet 3.5 both significantly improved over prior models on SimpleBench.
The math, coding, and science benchmarks in the o1 announcement post:
Additional data points:
o1-preview and the new Claude Sonnet 3.5 both significantly improved over prior models on SimpleBench.
The math, coding, and science benchmarks in the o1 announcement post: