Yes, but I think the larger update is that recent models from OpenAI are likely quite small and inference time compute usage creates more an incentive for small models. It seems likely that (e.g.) o1-mini is quite small given that it generates at 220 tokens per second(!), perhaps <30 billion active parameters based on the link from epoch given earlier. I’d guess (idk) 100 billion params. Likely something similar holds for o3-mini.
(I think the update from deepseek in particular might be smaller than you think as export controls create an artifical incentive for smaller models.)
Yes, but I think the larger update is that recent models from OpenAI are likely quite small and inference time compute usage creates more an incentive for small models. It seems likely that (e.g.) o1-mini is quite small given that it generates at 220 tokens per second(!), perhaps <30 billion active parameters based on the link from epoch given earlier. I’d guess (idk) 100 billion params. Likely something similar holds for o3-mini.
(I think the update from deepseek in particular might be smaller than you think as export controls create an artifical incentive for smaller models.)