How do we know it was 3x? (If true, I agree with your analysis)
aog
Do you take Grok 3 as an update on the importance of hardware scaling? If xAI used 5-10x more compute than any other model (which seems likely but not necessarily true?), then the fact that it wasn’t discontinuously better than other models seems like evidence against the importance of hardware scaling.
I’m surprised they list bias and disinformation. Maybe this is a galaxy brained attempt to discredit AI safety by making it appear left-coded, but I doubt it. Seems more likely that x-risk focused people left the company while traditional AI ethics people stuck around and rewrote the website.
I’m very happy to see Meta publish this. It’s a meaningfully stronger commitment to avoiding deployment of dangerous capabilities than I expected them to make. Kudos to the people who pushed for companies to make these commitments and helped them do so.
One concern I have with the framework is that I think the “high” vs. “critical” risk thresholds may claim a distinction without a difference.
Deployments are high risk if they provide “significant uplift towards execution of a threat scenario (i.e. significantly enhances performance on key capabilities or tasks needed to produce a catastrophic outcome) but does not enable execution of any threat scenario that has been identified as potentially sufficient to produce a catastrophic outcome.” They are critical risk if they “uniquely enable the execution of at least one of the threat scenarios that have been identified as potentially sufficient to produce a catastrophic outcome.” The framework requires that threats be “net new,” meaning “The outcome cannot currently be realized as described (i.e. at that scale / by that threat actor / for that cost) with existing tools and resources.”
But what then is the difference between high risk and critical risk? Unless a threat scenario is currently impossible, any uplift towards achieving it more efficiently also “uniquely enables” it under a particular budget or set of constraints. For example, it is already possible for an attacker to create bio-weapons, as demonstrated by the anthrax attacks—so any cost reductions or time savings for any part of that process uniquely enable execution of that threat scenario within a given budget or timeframe. Thus it seems that no model can be classified as high risk if it provides uplift on an already-achievable threat scenario—instead, it must be classified as critical risk.
Does that logic hold? Am I missing something in my reading of the document?
Curious what you think of these arguments, which offer objections to the strategy stealing assumption in this setting, instead arguing that it’s difficult for capital owners to maintain their share of capital ownership as the economy grows and technology changes.
DeepSeek-R1 naturally learns to switch into other languages during CoT reasoning. When developers penalized this behavior, performance dropped. I think this suggests that the CoT contained hidden information that cannot be easily verbalized in another language, and provides evidence against the hope that reasoning CoT will be highly faithful by default.
Wouldn’t that conflict with the quote? (Though maybe they’re not doing what they’ve implied in the quote)
Process supervision seems like a plausible o1 training approach but I think it would conflict with this:
We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to “read the mind” of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought.
I think it might just be outcome-based RL, training the CoT to maximize the probability of correct answers or maximize human preference reward model scores or minimize next-token entropy.
This is my impression too. See e.g. this recent paper from Google, where LLMs critique and revise their own outputs to improve performance in math and coding.
Agreed, sloppy phrasing on my part. The letter clearly states some of Anthropic’s key views, but doesn’t discuss other important parts of their worldview. Overall this is much better than some of their previous communications and the OpenAI letter, so I think it deserves some praise, but your caveat is also important.
Really happy to see the Anthropic letter. It clearly states their key views on AI risk and the potential benefits of SB 1047. Their concerns seem fair to me: overeager enforcement of the law could be counterproductive. While I endorse the bill on the whole and wish they would too (and I think their lack of support for the bill is likely partially influenced by their conflicts of interest), this seems like a thoughtful and helpful contribution to the discussion.
I think there’s a decent case that SB 1047 would improve Anthropic’s business prospects, so I’m not sure this narrative makes sense. On one hand, SB 1047 might make it less profitable to run an AGI company, which is bad for Anthropic’s business plan. But Anthropic is perhaps the best positioned of all AGI companies to comply with the requirements of SB 1047, and might benefit significantly from their competitors being hampered by the law.
The good faith interpretation of Anthropic’s argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency.
My understanding is that LLCs can be legally owned and operated without any individual human being involved: https://journals.library.wustl.edu/lawreview/article/3143/galley/19976/view/
So I’m guessing an autonomous AI agent could own and operate an LLC, and use that company to purchase cloud compute and run itself, without breaking any laws.
Maybe if the model escaped from the possession of a lab, there would be other legal remedies available.
Of course, cloud providers could choose not to rent to an LLC run by an AI. This seems particularly likely if the government is investigating the issue as a natsec threat.
Over longer time horizons, it seems highly likely that people will deliberately create autonomous AI agents and deliberately release them into the wild with the goal of surviving and spreading, unless there are specific efforts to prevent this.
Has MIRI considered supporting work on human cognitive enhancement? e.g. Foresight’s work on WBE.
Very cool, thanks! This paper focuses on building a DS Agent, but I’d be interested to see a version of this paper that focuses on building a benchmark. It could evaluate several existing agent architectures, benchmark them against human performance, and leave significant room for improvement by future models.
I want to make sure we get this right, and I’m happy to change the article if we misrepresented the quote. I do think the current version is accurate, though perhaps it could be better. Let me explain how I read the quote, and then suggest possible edits, and you can tell me if they would be any better.
Here is the full Time quote, including the part we quoted (emphasis mine):
But, many of the companies involved in the development of AI have, at least in public, struck a cooperative tone when discussing potential regulation. Executives from the newer companies that have developed the most advanced AI models, such as OpenAI CEO Sam Altman and Anthropic CEO Dario Amodei, have called for regulation when testifying at hearings and attending Insight Forums. Executives from the more established big technology companies have made similar statements. For example, Microsoft vice chair and president Brad Smith has called for a federal licensing regime and a new agency to regulate powerful AI platforms. Both the newer AI firms and the more established tech giants signed White House-organized voluntary commitments aimed at mitigating the risks posed by AI systems.
But in closed door meetings with Congressional offices, the same companies are often less supportive of certain regulatory approaches, according to multiple sources present in or familiar with such conversations. In particular, companies tend to advocate for very permissive or voluntary regulations. “Anytime you want to make a tech company do something mandatory, they’re gonna push back on it,” said one Congressional staffer.
Who are “the same companies” and “companies” in the second paragraph? The first paragraph specifically mentions OpenAI, Anthropic, and Microsoft. It also discusses broader groups of companies that include these three specific companies “both the newer AI firms and the more established tech giants,” and “the companies involved in the development of AI [that] have, at least in public, struck a cooperative tone when discussion potential regulation.” OpenAI, Anthropic, and Microsoft, and possibly others in the mentioned reference classes, appear to be the “companies” that the second paragraph is discussing.
We summarized this as “companies, such as OpenAI and Anthropic, [that] have publicly advocated for AI regulation.” I don’t think that substantially changes the meaning of the quote. I’d be happy to change it to “OpenAI, Anthropic, and Microsoft” given that Microsoft was also explicitly named in the first paragraph. Do you think that would accurately capture the quote’s meaning? Or would there be a better alternative?
More discussion of this here. Really not sure what happened here, would love to see more reporting on it.
Benchmarking LLM Agents on Kaggle Competitions
(Steve wrote this, I only provided a few comments, but I would endorse it as a good holistic overview of AIxBio risks and solutions.)
Curious what you think of arguments (1, 2) that AIs should be legally allowed to own property and participate in our economic system, thus giving misaligned AIs an alternative prosocial path to achieving their goals.