There should probably be a dialogue between you and @Vladimir_Nesov over how much algorithmic improvements actually work to make AI more powerful, since this might reveal cruxes and help everyone else prepare better for the various AI scenarios.
For what it’s worth, seems to me that Jack Clark of Anthropic is mostly in agreement with @Vladimir_Nesov about compute being the primary factor: Quoting from Jack’s blog here.
The world’s most capable open weight model is now made in China: …Tencent’s new Hunyuan model is a MoE triumph, and by some measures is world class… The world’s best open weight model might now be Chinese—that’s the takeaway from a recent Tencent paper that introduces Hunyuan-Large, a MoE model with 389 billion parameters (52 billion activated).
Why this matters—competency is everywhere, it’s just compute that matters: This paper seems generally very competent and sensible. The only key differentiator between this system and one trained in the West is compute—on the scaling law graph this model seems to come in somewhere between 10^24 and 10^25 flops of compute, whereas many Western frontier models are now sitting at between 10^25 and 10^26 flops. I think if this team of Tencent researchers had access to equivalent compute as Western counterparts then this wouldn’t just be a world class open weight model—it might be competitive with the far more experience proprietary models made by Anthropic, OpenAI, and so on. Read more:Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent (arXiv).
There should probably be a dialogue between you and @Vladimir_Nesov over how much algorithmic improvements actually work to make AI more powerful, since this might reveal cruxes and help everyone else prepare better for the various AI scenarios.
For what it’s worth, seems to me that Jack Clark of Anthropic is mostly in agreement with @Vladimir_Nesov about compute being the primary factor:
Quoting from Jack’s blog here.