The blog post was since published. There is a sentence “10x the compute of previous state-of-the-art models” in it that’s highly misleading, the claim from the video presentation is that it’s 10x Grok 2 compute, and my estimate is that it’s about 3e26 FLOPs, or 3x the compute of GPT-4o.
Being rushed is crucial context, there was maybe a month for post-training to produce the Chatbot Arena checkpoint. It feels smart, but has much more trouble seeing intended meaning than Claude 3.6 Sonnet, creating a need for numerous caveats before it understands. I expect this will be fixed in a couple of months, but they couldn’t wait, or else it wouldn’t have its SOTA moment.
The blog post was since published. There is a sentence “10x the compute of previous state-of-the-art models” in it that’s highly misleading, the claim from the video presentation is that it’s 10x Grok 2 compute, and my estimate is that it’s about 3e26 FLOPs, or 3x the compute of GPT-4o.
Being rushed is crucial context, there was maybe a month for post-training to produce the Chatbot Arena checkpoint. It feels smart, but has much more trouble seeing intended meaning than Claude 3.6 Sonnet, creating a need for numerous caveats before it understands. I expect this will be fixed in a couple of months, but they couldn’t wait, or else it wouldn’t have its SOTA moment.