Zac Hatfield-Dodds

Karma: 3,038

Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev

Zac Hatfield-Dodds Apr 4, 2025, 3:43 PM
3 points
0
in reply to: Zac Hatfield-Dodds’s comment on: Stupid Question: Why am I getting consistently downvoted?
Unfortunately MadHatter hasn’t responded to messages sent in March, and I haven’t heard anything from GiveWell to suggest that the donation has been made.

Zac Hatfield-Dodds Mar 1, 2025, 10:25 PM
5 points
2
in reply to: Mikhail Samin’s comment on: Mikhail Samin’s Shortform
I strongly disagree that OpenAI’s and Anthropic’s efforts were similar (maybe there’s a bet there?). OpenAI formally opposed the bill without offering useful feedback; Anthropic offered consistent feedback to improve the bill, pledged to support it if amended, and despite your description of the second letter Senator Wiener describes himself as having Anthropic’s support.

I also disagree that a responsible company would have behaved differently. You say “The question is, was Anthropic supportive of SB-1047 specifically?”—but I think this is the wrong question, implying that lack of support is irresponsible rather than e.g. due to disagreements about the factual question of whether passing the bill in a particular state would be net-helpful for mitigating catastrophic risks. The Support if Amended letter, for example, is very clear:

Anthropic does not support SB 1047 in its current form. However, we believe the bill’s core aims to ensure the safe development of AI technologies are worthy, and that it is possible to achieve these aims while eliminating most of the current bill’s substantial drawbacks, as we will propose here. … We are committed to supporting the bill if all of our proposed amendments are made.

I don’t expect further discussion to be productive though; much of the additional information I have is nonpublic, and we seem to have different views on what constitutes responsible input into a policy process as well as basic questions like “is Anthropic’s engagement in the SB-1047 process well described as ‘support’ when the letter to Governor Newsom did not have the word ‘support’ in the subject line”. This isn’t actually a crux for me, but I and Senator Wiener seem to agree yes, while you seem to think no.

Zac Hatfield-Dodds Feb 28, 2025, 7:36 AM
5 points
3
on: January-February 2025 Progress in Guaranteed Safe AI
But also, I’d like to ask Zac how it’s “overrated” when the reception from funders is not even lukewarm. … Does Zac mean the current level of funding is already too high, or is he just worried about that number increasing?
My lightning talk was pitched primarily to academic researchers considering a transition into AI safety research, so I meant “overrated in that proponents’ claims, e.g. in papers such as Towards GSAI or Tegmark & Omohundro, often seem unrealistic”.
I don’t have a strong opinion on overall funding levels, though I’d generally favor GSAI projects conceptualized as strengthening or extending existing practice (vs building a new paradigm) because I think they’re much more likely to pay off.

Zac Hatfield-Dodds Feb 28, 2025, 6:55 AM
15 points
0
in reply to: Mikhail Samin’s comment on: Mikhail Samin’s Shortform
Sorry, I’m not sure what proposition this would be a crux for?

More generally, “what fraction good vs bad” seems to me a very strange way to summarize Anthropic’s Support if Amended letter or letter to Governor Newsom. It seems clear to me that both are supportive in principle of new regulation to manage emerging risks, and offering Anthropic’s perspective on how best to achieve that goal. I expect most people who carefully read either letter would agree with the preceeding sentence and would be open to bets on such a proposition.

Personally, I’m also concerned about the downside risks discussed in these letters—because I expect they both would have imposed very real costs, and reduced the odds of the bill passing and similar regulations passing and enduring in other juristictions. I nonetheless concluded that the core of the bill was sufficiently important and urgent, and downsides manageable, that I supported passing it.

Zac Hatfield-Dodds Feb 10, 2025, 4:46 AM
2 points
0
on: Don’t go bankrupt, don’t go rogue

How did the greens get here?

Largely via opposition to nuclear weapons, and some cost-benefit analysis which assumes nuclear proponents are too optimistic about both costs and risks of nuclear power (further reading). Personally I think this was pretty reasonable in the 70s and 80s. At this point I’d personally prefer to keep existing nuclear running and build solar panels instead of new reactors, though if SMRs worked in a sane regulatory regime that’d be nice too.

Zac Hatfield-Dodds Feb 10, 2025, 4:13 AM
3 points
1
in reply to: Mikhail Samin’s comment on: Mikhail Samin’s Shortform
To quote from Anthropic’s letter to Govenor Newsom,
As you may be aware, several weeks ago Anthropic submitted a Support if Amended letter regarding SB 1047, in which we suggested a series of amendments to the bill. … In our assessment the new SB 1047 is substantially improved, to the point where we believe its benefits likely outweigh its costs.

...

We see the primary benefits of the bill as follows:
- Developing SSPs and being honest with the public about them. The bill mandates the adoption of safety and security protocols (SSPs), flexible policies for managing catastrophic risk that are similar to frameworks adopted by several of the most advanced developers of AI systems, including Anthropic, Google, and OpenAI. However, some companies have still not adopted these policies, and others have been vague about them. Furthermore, nothing prevents companies from making misleading statements about their SSPs or about the results of the tests they have conducted as part of their SSPs. It is a major improvement, with very little downside, that SB 1047 requires companies to adopt some SSP (whose details are up to them) and to be honest with the public about their SSP-related practices and findings.
...

We believe it is critical to have some framework for managing frontier AI systems that roughly meets [requirements discussed in the letter]. As AI systems become more powerful, it’s crucial for us to ensure we have appropriate regulations in place to ensure their safety.

Zac Hatfield-Dodds Feb 10, 2025, 4:09 AM
4 points
0
in reply to: Adam Scholl’s comment on: Mikhail Samin’s Shortform
Here I am on record supporting SB-1047, along with many of my colleagues. I will continue to support specific proposed regulations if I think they would help, and oppose them if I think they would be harmful; asking “when” independent of “what” doesn’t make much sense to me and doesn’t seem to follow from anything I’ve said.

My claim is not “this is a bad time”, but rather “given the current state of the art, I tend to support framework/liability/etc regulations, and tend to oppose more-specific/exact-evals/etc regulations”. Obviously if the state of the art advanced enough that I thought the latter would be better for overall safety, I’d support them, and I’m glad that people are working on that.

Zac Hatfield-Dodds Feb 3, 2025, 6:32 AM
7 points
1
in reply to: Nathan Helm-Burger’s comment on: Mikhail Samin’s Shortform
There’s a big difference between regulation which says roughly “you must have something like an RSP”, and regulation which says “you must follow these specific RSP-like requirements”, and I think Mikhail is talking about the latter.

I personally think the former is a good idea, and thus supported SB-1047 along with many other lab employees. It’s also pretty clear to me that locking in circa-2023 thinking about RSPs would have been a serious mistake, and so I (along with many others) am generally against very specific regulations because we expect they would on net increase catastrophic risk.

Zac Hatfield-Dodds Feb 2, 2025, 1:55 AM
4 points
−4
in reply to: Noosphere89’s comment on: In response to critiques of Guaranteed Safe AI
Improving the sorry state of software security would be great, and with AI we might even see enough change to the economics of software development and maintenance that it happens, but it’s not really an AI safety agenda.

(added for clarity: of course it can be part of a safety agenda, but see point #1 above)

Zac Hatfield-Dodds Feb 1, 2025, 11:16 AM
LW: 14 AF: 10
14
AF
on: In response to critiques of Guaranteed Safe AI
I’m sorry that I don’t have time to write up a detailed response to (critique of?) the response to critiques; hopefully this brief note is still useful.
1. I remain frustrated by GSAI advocacy. It’s suited for well-understood closed domains, excluding e.g. natural language, when discussing feasibility; but ‘we need rigorous guarantees for current or near-future AI’ when arguing for importance. It’s an extension to or complement of current practice; and current practice is irresponsible and inadequate. Often this is coming from different advocates, but that doesn’t make it less frustrating for me.
2. Claiming that non-vacuous sound (over)approximations are feasible, or that we’ll be able to specify and verify non-trivial safety properties, is risible. Planning for runtime monitoring and anomaly detection is IMO an excellent idea, but would be entirely pointless if you believed that we had a guarantee!
3. It’s vaporware. I would love to see a demonstration project and perhaps lose my bet, but I don’t find papers or posts full of details compelling, however long we could argue over them. Nullius in verba!
I like the idea of using formal tools to complement and extend current practice—I was at the workshop where Towards GSAI was drafted, and offered co-authorship—but as much I admire the people involved, I just don’t believe the core claims of the GSAI agenda as it stands.

Zac Hatfield-Dodds Jan 30, 2025, 7:27 PM
LW: 2 AF: 1
8
AF
in reply to: habryka’s comment on: Ten people on the inside
I don’t think Miles’ or Richard’s stated reasons for resigning included safety policies, for example.

But my broader point is that “fewer safety people should quit leading labs to protest poor safety policies” is basically a non-sequitor from “people have quit leading labs because they think they’ll be more effective elsewhere”, whether because they want to do something different or independent, or because they no longer trust the lab to behave responsibly.

Zac Hatfield-Dodds Jan 30, 2025, 7:09 PM
LW: 6 AF: 3
2
AF
in reply to: Buck’s comment on: Ten people on the inside
I agree with Rohin that there are approximately zero useful things that don’t make anyone’s workflow harder. The default state is “only just working means working, so I’ve moved on to the next thing” and if you want to change something there’d better be a benefit to balance the risk of breaking it.

Also 3% of compute is so much compute; probably more than the “20% to date over four years” that OpenAI promised and then yanked from superalignment. Take your preferred estimate of lab compute spending, multiply by 3%, and ask yourself whether a rushed unreasonable lab would grant that much money to people working on a topic it didn’t care for, at the expense of those it did.

Zac Hatfield-Dodds Jan 29, 2025, 10:31 AM
LW: 16 AF: 7
0
AF
in reply to: Scott Alexander’s comment on: Ten people on the inside
My impression is that few (one or two?) of the safety people who have quit a leading lab did so to protest poor safety policies, and of those few none saw staying as a viable option.

Relatedly, I think Buck far overestimates the influence and resources of safety-concerned staff in a ‘rushed unreasonable developer’.

Zac Hatfield-Dodds Jan 27, 2025, 6:54 AM
5 points
0
on: Starting an Egan High School
This seems like a very long list of complicated and in many cases new and untested changes to the way schools usually work… which is not in itself bad, but does make the plan very risky. How many students do you imagine attend this school? Have you spoken to people who have founded a similar-sized school?

The good news is that outcomes for exciting new opt-in educational things tend to be pretty good; the bad news is that this is usually for reasons other than “the new thing works”—e.g. the families are engaged and care about education, the teachers are passionate, the school is responsive to changing conditions, etc. If your goal is large-scale educational reform I would not hold out much hope; if you’d be happy running a small niche school with flourishing students (eg) for however long it lasts, that seems achievable with hard work.

Zac Hatfield-Dodds Jan 11, 2025, 1:25 PM
4 points
−1
on: POC || GTFO culture as partial antidote to alignment wordcelism
“POC || GTFO culture” need not be literal, and generally cannot be when speculating about future technologies. I wouldn’t even want a proof-of-concept misaligned superintelligence!

Nonetheless, I think the field has been improved by an increasing emphasis on empiricism and demonstrations over the last two years, in technical research, in governance research, and in advocacy. I’d still like to see more carefully caveating of claims for which we have arguments but not evidence, and it’s useful to have a short handle for that idea—“POC || admit you’re unsure”, perhaps?

Zac Hatfield-Dodds Jan 10, 2025, 3:42 PM
10 points
0
on: Change my mind: Veganism entails trade-offs, and health is one of the axes
I think Elizabeth is correct here, and also that vegan advocates would be considerably more effective with higher epistemic standards:

I think veganism comes with trade-offs, health is one of the axes, and that the health issues are often but not always solvable. This is orthogonal to the moral issue of animal suffering. If I’m right, animal EAs need to change their messaging around vegan diets, and start self-policing misinformation. If I’m wrong, I need to write some retractions and/or shut the hell up.

The post unfortunately suffers for its length, detailed explanations, and rebuttal of many motivated misreadings—many of which can be found in the comments, so it’s unclear whether this helped. It’s also well-researched and cited, well organized, offers cruxes and anticipates objections—vegan advocates are fortunate to have such high-quality criticism.

This could have been a shorter post, which was about rather than engaged in epistemics and advocacy around veganism, with less charitable assumptions. I’d have shared that shorter post more often, but I don’t think it would be better.

Zac Hatfield-Dodds Jan 10, 2025, 2:59 PM
35 points
3
on: Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
I remain both skeptical some core claims in this post, and convinced of its importance. GeneSmith is one of few people with such a big-picture, fresh, wildly ambitious angle on beneficial biotechnology, and I’d love to see more of this genre.

One one hand on the object level, I basically don’t buy the argument that in-vivo editing could lead to substantial cognitive enhancement in adults. Brain development is incredibly important for adult cognition, and in the maybe 1%--20% residual you’re going well off-distribution for any predictors trained on unedited individuals. I too would prefer bets that pay off before my median AI timelines, but biology does not always allow us to have nice things.

On the other, gene therapy does indeed work in adults for some (much simpler) issues, and there might be valuable interventions which are narrower but still valuable. Plus, of course, there’s the nineteen-ish year pathway to adults, building on current practice. There’s no shortage of practical difficulties, but the strong or general objections I’ve seen seem ill-founded, and that makes me more optimistic about eventual feasibility of something drawing on this tech tree.

I’ve been paying closer attention to the space thanks to Gene’s posts, to the point of making some related investments, and look forward to watching how these ideas fare on contact with biological and engineering reality over the next few years.

Zac Hatfield-Dodds Jan 10, 2025, 1:13 PM
LW: 9 AF: 5
5
AF
on: Statement on AI Extinction—Signed by AGI Labs, Top Academics, and Many Other Notable Figures
I think this is the most important statement on AI risk to date. Where ChatGPT brought “AI could be very capable” into the overton window, the CAIS Statement brought in AI x-risk. When I give talks to NGOs, or business leaders, or government officials, I almost always include a slide with selected signatories and the full text:

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

I believe it’s true, that it was important to say, and that it’s had an ongoing, large, and positive impact. Thank you again to the organizers and to my many, many co-signatories.

Zac Hatfield-Dodds Jan 10, 2025, 12:49 PM
4 points
0
on: What Have Been Your Most Valuable Casual Conversations At Conferences?
I’ve been to quite a few Python conferences; typically I find the unstructured time in hallways, over dinner, and in “sprints” both fun and valuable. I’ve made great friends and recruited new colleagues, conceived and created new libraries, built professional relationships, hashed out how to fix years-old infelicities in various well-known things, etc.

Conversations at afterparties led me to write concrete reasons for hope about AI, and at another event met a friend working on important-to-me biotechnology (I later invested in their startup). I’ve also occasionally taken something useful away from AI safety conversations, or in one memorable late-night at LessOnline hopefully conveyed something important about my work.

There are many more examples, but it also feels telling that I can’t give you examples of conference talks that amazed me in person (there are some great ones recorded but your odds are low, and most I’d prefer to read a good written version instead), and structured events I’ve enjoyed are things like “the Python language summit” or “conference dinners which are mostly socializing”—so arguably the bar is low

Zac Hatfield-Dodds Jan 10, 2025, 4:41 AM
8 points
0
in reply to: kave’s comment on: (The) Lightcone is nothing without its people: LW + Lighthaven’s big fundraiser
And I’ve received an email from Mieux Donner confirming Lucie’s leg has been executed for 1,000€. Thanks to everyone involved!

If if anyone else is interested in a similar donation swap, from either side, I’d be excited to introduce people or maybe even do this trick again :D