Evolution also distinguishes between one and two progeny so it is not binary, but yeah, just a few bits per lifetime.
sanxiyn
An agent autonomously builds a 1.5 GHz Linux-capable RISC-V CPU
I am quite uninformed, but when I read about compute multipliers I considered it to obviously include data-related improvements. To quip, FineWeb-Edu was algorithmically filtered, it obviously wasn’t manually curated. As an evidence that it is not just my misunderstanding, I quote Dean W. Ball (my point is that it may well be my misunderstanding, but then such misunderstanding is common):
… Amodei describes this as a “compute multiplier”: … These gains come from all sorts of places: … improvements to training datasets that allow the model to learn more quickly …
OpenSSL is extremely widely used and it is hard to argue with OpenSSL CVEs. On the other hand, I am starting to suspect OpenSSL is a somewhat special case. My understanding is unfortunately for such a widely used codebase, OpenSSL codebase is not in a good state. Someone on Hacker News noted 0⁄12 of CVEs apply to BoringSSL (Google’s OpenSSL fork).
I have much less problems with curl CVEs and I think they are impressive.
The other lab leaders have not commented on the topic in public in 2025.
I don’t think this is true. Amodei on AI: “There’s a 25% chance that things go really, really badly”.
2025-08 update. Anthropic now defaults to (you can opt out) using your chats for AI training, see for example https://techcrunch.com/2025/08/28/anthropic-users-face-a-new-choice-opt-out-or-share-your-data-for-ai-training/
I think IMO results were driven by general purpose advances, but I agree I can’t conclusively prove it because we don’t know details. Hopefully we will learn more as time goes by.
An informal argument: I think currently agentic software engineering is blocked on context rot, among other things. I expect IMO systems to have improved on this, since IMO time control is 1.5 hours per problem.
I think non-formal IMO gold was unexpected and we heard explicitly that it won’t be in GPT-5. So I would wait to see how it would pan out. It may not matter in 2025 but I think it can in 2026.
I think it is important to note that Gemini 2.5 Pro Capable of Winning Gold at IMO 2025, with good enough scaffolding and prompt engineering.
Do you have any Solomonoff inductor you know? I don’t, and I would like an introduction.
Ethan Mollick’s Using AI Right Now: A Quick Guide from 2025-06 is in the same genre and pretty much says the same thing, but the presentation is a bit different and it may suit you better, so check it out. Naturally it doesn’t discuss Grok 4, but it also does discuss some things missing here.
Anthropic does have a data program, although it is only for Claude Code, and it is opt in. See About the Development Partner Program. It gives you 30% discount in exchange.
CloudMatrix was not, but Huawei Ascend has been there for a long time, and was used to train LLM even back in 2022. I didn’t realize AI 2027 predated CloudMatrix but I still think ignoring China for Compute Production was unjustified.
This is a good argument and I think it is mostly true, but this absolutely should be in AI 2027 Compute Forecast page. Simply not saying a word about the topic makes it looks unserious and incompetent. In fact, that reaction happened repeatedly in my discussion with my friends in South Korea.
Serving LLM on Huawei CloudMatrix
I know cyber eval results are underelicitation. Sonnet 4 can find zero day vulnerabilities, we are now in process of disclosing. If you can’t get it to do that it’s your skill issue.
Preordered ebook version on Amazon. I am also interested in doing Korean translation.
I disagree on DeekSeek and innovation. Yes R1 is obviously a reaction to o1, but its MoE model is pretty innovative, and it is Llama 4 that obviously copied DeepSeek. But yes I agree innovation is unpopular in China. But from interviews of DeepSeek founder Liang Wenfeng, we know DeepSeek was explicitly an attempt to overcome China’s unwillingness to innovate.
Maybe we are talking about different problems, but we found instructing models to give up (literally “give up”, I just checked the source) under certain conditions to be effective.
I think this is a plausible top submission of a term project for a team in an undergraduate computer architecture class in an elite university.