ryan_greenblatt comments on DeekSeek v3: The Six Million Dollar Model

ryan_greenblatt 1 Jan 2025 4:21 UTC
9 points
6
Yes, but I think the larger update is that recent models from OpenAI are likely quite small and inference time compute usage creates more an incentive for small models. It seems likely that (e.g.) o1-mini is quite small given that it generates at 220 tokens per second(!), perhaps <30 billion active parameters based on the link from epoch given earlier. I’d guess (idk) 100 billion params. Likely something similar holds for o3-mini.

(I think the update from deepseek in particular might be smaller than you think as export controls create an artifical incentive for smaller models.)