DeepSeek says it will release the weights and publish a report.
The model appears to be stronger than o1-preview on math, similar on coding, and weaker on other tasks.
DeepSeek is Chinese. I’m not really familiar with the company. I thought Chinese companies were at least a year behind the frontier; now I don’t know what to think and hope people do more evals and play with this model. Chinese companies tend to game benchmarks more than the frontier Western companies, but I think DeepSeek hasn’t gamed benchmarks much historically.
The post also shows inference-time scaling, like o1:
Note that o1 is substantially stronger than o1-preview; see the o1 post:
(Parts of this post and some of my comments are stolen from various people-who-are-not-me.)
DeepSeek beats o1-preview on math, ties on coding; will release weights
DeepSeek-R1-Lite-Preview was announced today. It’s available via chatbot. (Post; translation of Chinese post.)
DeepSeek says it will release the weights and publish a report.
The model appears to be stronger than o1-preview on math, similar on coding, and weaker on other tasks.
DeepSeek is Chinese. I’m not really familiar with the company. I thought Chinese companies were at least a year behind the frontier; now I don’t know what to think and hope people do more evals and play with this model. Chinese companies tend to game benchmarks more than the frontier Western companies, but I think DeepSeek hasn’t gamed benchmarks much historically.
The post also shows inference-time scaling, like o1:
Note that o1 is substantially stronger than o1-preview; see the o1 post:
(Parts of this post and some of my comments are stolen from various people-who-are-not-me.)