AI strategy & governance. ailabwatch.org. ailabwatch.substack.com.
Zach Stein-Perlman
Everyone knew everyone knew everyone knew everyone knew someone had blue eyes. But everyone didn’t know that—so there wasn’t common knowledge—until the sailor made it so.
I think the conclusion is not Epoch shouldn’t have hired Matthew, Tamay, and Ege but rather [Epoch / its director] should have better avoided negative-EV projects (e.g. computer use evals) (and shouldn’t have given Tamay leadership-y power such that he could cause Epoch to do negative-EV projects — idk if that’s what happened but seems likely).
Good point. You’re right [edit: about Epoch].
I should have said: the vibe I’ve gotten from Epoch and Matthew/Tamay/Ege in private in the last year is not safety-focused. (Not that I really know all of them.)
(ha ha but Epoch and Matthew/Tamay/Ege were never really safety-focused, and certainly not bright-eyed standard-view-holding EAs, I think)
Accelerating AI R&D automation would be bad. But they want to accelerate misc labor automation. The sign of this is unclear to me.
wow
I think this stuff is mostly a red herring: the safety standards in OpenAI’s new PF are super vague and so it will presumably always be able to say it meets them and will never have to use this.[1]
But if this ever matters, I think it’s good: it means OpenAI is more likely to make such a public statement and is slightly less incentivized to deceive employees + external observers about capabilities and safeguard adequacy. OpenAI unilaterally pausing is not on the table; if safeguards are inadequate, I’d rather OpenAI say so.
- ^
I think my main PF complaints are:
The High standard is super vague: just like “safeguards should sufficiently minimize the risk of severe harm” + level of evidence is totally unspecified for “potential safeguard efficacy assessments.” And some of the misalignment safeguards are confused/bad, and this is bad since the PF they may be disjunctive — if OpenAI is wrong about a single “safeguard efficacy assessment” that makes the whole plan invalid. And it’s bad that misalignment safeguards are only clearly triggered by cyber capabilities, especially since the cyber High threshold is vague / too high.
For more see OpenAI rewrote its Preparedness Framework.
- ^
OpenAI rewrote its Preparedness Framework
I don’t know. I don’t have a good explanation for why OpenAI hasn’t released o3. Delaying to do lots of risk assessment would be confusing because they did little risk assessment for other models.
OpenAI slashes AI model safety testing time, FT reports. This is consistent with lots of past evidence about OpenAI’s evals for dangerous capabilities being rushed, being done on weak checkpoints, and having worse elicitation than OpenAI has committed to.
This is bad because OpenAI is breaking its commitments (and isn’t taking safety stuff seriously and is being deceptive about its practices). It’s also kinda bad in terms of misuse risk, since OpenAI might fail to notice that its models have dangerous capabilities. I’m not saying OpenAI should delay deployments for evals — there may be strategies that are better (similar misuse-risk-reduction with less cost-to-the-company) than detailed evals for dangerous capabilities before external deployment, where you generally do slow/expensive evals after your model is done (even if you want to deploy externally before finishing evals) and have a safety buffer and increase the sensitivity of your filters early in deployment (when you’re less certain about risk). But OpenAI isn’t doing that; it’s just doing a bad job of the evals before external deployment plan.
(Regardless, maybe short-term misuse isn’t so scary, and maybe short-term misuse risk comes mostly from open-weights or stolen models than models that can be undeployed/mitigated if misuse risks appear during deployment. And insofar as you’re more worried about risks from internal deployment, maybe you should focus on evals and mitigations relevant to those threats rather than external deployment. (OpenAI’s doing even worse on risks from internal deployment!))
tl;dr: OpenAI is doing risk assessment poorly[1] but maybe do detailed evals for dangerous capabilities before external deployment isn’t a great ask.
- ^
But similar to DeepMind and Anthropic, and those three are better than any other AI companies
- ^
I think this isn’t taking powerful AI seriously. I think the quotes below are quite unreasonable, and only ~half of the research agenda is plausibly important given that there will be superintelligence. So I’m pessimistic about this agenda/project relative to, say, the Forethought agenda.
AGI could lead to massive labor displacement, as studies estimate that between 30% − 47% of jobs could be directly replaceable by AI systems. . . .
AGI could lead to stagnating or falling wages for the majority of workers if AI technology replaces people faster than it creates new jobs.
We could see an increase of downward social mobility for workers, as traditionally “high-skilled” service jobs become cheaply performed by AI, but manual labor remains difficult to automate due to the marginal costs of deploying robotics. These economic pressures could reduce the bargaining power of workers, potentially forcing more people towards gig economy roles or less desirable (e.g. physically demanding) jobs.
Simultaneously, upward social mobility could decline dramatically as ambitious individuals from lower-class backgrounds may lose their competitive intellectual advantages to AGI systems. Fewer entry-level skilled jobs and the reduced comparative value of intelligence could reduce pathways to success – accelerating the hollowing out of the middle class.
If AGI systems are developed, evidence across the board points towards the conclusion that the majority of workers could likely lose out from this coming economic transformation. A core bargain of our society – if you work hard, you can get ahead – may become tenuous if opportunities for advancement and economic security dry up.
Update: based on nonpublic discussion I think maybe Deric is focused on the scenario the world is a zillion times wealthier and humanity is in control but many humans have a standard of living that is bad by 2025 standards. I’m not worried about this because it would take a tiny fraction of resources to fix that. (Like, if it only cost 0.0001% of global resources to end poverty forever, someone would do it.)
My guess:
This is about software tasks, or specifically “well-defined, low-context, measurable software tasks that can be done without a GUI.” It doesn’t directly generalize to solving puzzles or writing important papers. It probably does generalize within that narrow category.
If this was trying to measure all tasks, tasks that AIs can’t do would count toward the failure rate; the main graph is about 50% success rate, not 100%. If we were worried that this is misleading because AIs are differentially bad at crucial tasks or something, we could look at success rate on those tasks specifically.
I don’t know, maybe nothing. (I just meant that on current margins, maybe the quality of the safety team’s plans isn’t super important.)
I haven’t read most of the paper, but based on the Extended Abstract I’m quite happy about both the content and how DeepMind (or at least its safety team) is articulating an “anytime” (i.e., possible to implement quickly) plan for addressing misuse and misalignment risks.
But I think safety at Google DeepMind is more bottlenecked by buy-in from leadership to do moderately costly things than the safety team having good plans and doing good work. [Edit: I think the same about Anthropic.]
I expect they will thus not want to use my quotes
Yep, my impression is that it violates the journalist code to negotiate with sources for better access if you write specific things about them.
My strong upvotes are giving +61 :shrug:
Minor Anthropic RSP update.
I don’t know what e.g. the “4” in “AI R&D-4″ means; perhaps it is a mistake.[1]
Sad that the commitment to specify the AI R&D safety case thing was removed, and sad that “accountability” was removed.
Slightly sad that AI R&D capabilities triggering >ASL-3 security went from “especially in the case of dramatic acceleration” to only in that case.
Full AI R&D-5 description from appendix:
AI R&D-5: The ability to cause dramatic acceleration in the rate of effective scaling. Specifically, this would be the case if we observed or projected an increase in the effective training compute of the world’s most capable model that, over the course of a year, was equivalent to two years of the average rate of progress during the period of early 2018 to early 2024. We roughly estimate that the 2018-2024 average scaleup was around 35x per year, so this would imply an actual or projected one-year scaleup of 35^2 = ~1000x.
The 35x/year scaleup estimate is based on assuming the rate of increase in compute being used to train frontier models from ~2018 to May 2024 is 4.2x/year (reference), the impact of increased (LLM) algorithmic efficiency is roughly equivalent to a further 2.8x/year (reference), and the impact of post training enhancements is a further 3x/year (informal estimate). Combined, these have an effective rate of scaling of 35x/year.
Also, from the changelog:
Iterative Commitment: We have adopted a general commitment to reevaluate our Capability Thresholds whenever we upgrade to a new set of Required Safeguards. We have decided not to maintain a commitment to define ASL-N+1 evaluations by the time we develop ASL-N models; such an approach would add unnecessary complexity because Capability Thresholds do not naturally come grouped in discrete levels. We believe it is more practical and sensible instead to commit to reconsidering the whole list of Capability Thresholds whenever we upgrade our safeguards.
I’m confused by the last sentence.
See also https://www.anthropic.com/rsp-updates.
- ^
Jack Clark misspeaks on twitter, saying “these updates clarify . . . our ASL-4/5 thresholds for AI R&D.” But AI R&D-4 and −5 trigger ASL-3 and −4 safeguards, respectively; the RSP doesn’t mention ASL-5. Maybe the RSP is wrong and AI R&D-4 is supposed to trigger ASL-4 security, and same for 5 — that would make the terms “AI R&D-4″ and “AI R&D-5” make more sense...
- ^
A. Many AI safety people don’t support relatively responsible companies unilaterally pausing, which PauseAI advocates. (Many do support governments slowing AI progress, or preparing to do so at a critical point in the future. And many of those don’t see that as tractable for them to work on.)
B. “Pausing AI” is indeed more popular than PauseAI, but it’s not clearly possible to make a more popular version of PauseAI that actually does anything; any such organization will have strategy/priorities/asks/comms that alienate many of the people who think “yeah I support pausing AI.”
C.
There does not seem to be a legible path to prevent possible existential risks from AI without slowing down its current progress.
This seems confused. Obviously P(doom | no slowdown) < 1. Many people’s work reduces risk in both slowdown and no-slowdown worlds, and it seems pretty clear to me that most of them shouldn’t switch to working on increasing P(slowdown).
It is often said that control is for safely getting useful work out of early powerful AI, not making arbitrarily powerful AI safe.
If it turns out large, rapid, local capabilities increase is possible, the leading developer could still opt to spend some inference compute on safety research rather than all on capabilities research.
Is that your true rejection? (I’m surprised if you think the normalizing-libel-suits effect is nontrivial.)