DeepSeek beats o1-preview on math, ties on coding; will release weights

DeepSeek-R1-Lite-Preview was announced today. It’s available via chatbot. (Post; translation of Chinese post.)

DeepSeek says it will release the weights and publish a report.

The model appears to be stronger than o1-preview on math, similar on coding, and weaker on other tasks.

DeepSeek is Chinese. I’m not really familiar with the company. I thought Chinese companies were at least a year behind the frontier; now I don’t know what to think and hope people do more evals and play with this model. Chinese companies tend to game benchmarks more than the frontier Western companies, but I think DeepSeek hasn’t gamed benchmarks much historically.

The post also shows inference-time scaling, like o1:

Note that o1 is substantially stronger than o1-preview; see the o1 post:

(Parts of this post and some of my comments are stolen from various people-who-are-not-me.)