Zac Hatfield-Dodds

Karma: 3,058

Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev

Anthropic’s updated Responsible Scaling Policy

Zac Hatfield-DoddsOct 15, 2024, 4:46 PM

38 points

3 comments3 min readLW link

(www.anthropic.com)

Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-DoddsMay 20, 2024, 4:14 AM

30 points

21 comments10 min readLW link

(www.anthropic.com)

Simple probes can catch sleeper agents

Monte M, Carson Denison, Zac Hatfield-Dodds, David Duvenaud, Sam Bowman, Ethan Perez and evhub

Apr 23, 2024, 9:10 PM

133 points

21 comments1 min readLW link

(www.anthropic.com)

Third-party testing as a key ingredient of AI policy

Zac Hatfield-DoddsMar 25, 2024, 10:40 PM

11 points

1 comment12 min readLW link

(www.anthropic.com)

Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy

Zac Hatfield-DoddsNov 1, 2023, 6:10 PM

83 points

1 comment4 min readLW link

(www.anthropic.com)

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Zac Hatfield-DoddsOct 5, 2023, 9:01 PM

288 points

22 comments2 min readLW link 1 review

(transformer-circuits.pub)

Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust

Zac Hatfield-DoddsSep 19, 2023, 3:09 PM

85 points

26 comments3 min readLW link 1 review

(www.anthropic.com)

Anthropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC

172 points

39 comments2 min readLW link

(www.anthropic.com)

Concrete Reasons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC

94 points

13 comments1 min readLW link

In Defence of Spock

Zac Hatfield-Dodds21 Apr 2021 21:34 UTC

37 points

5 comments1 min readLW link

Zac Hatfield Dodds’s Shortform

Zac Hatfield-Dodds9 Mar 2021 2:39 UTC

2 points

13 comments LW link