RSS

Zac Hatfield-Dodds

Karma: 3,058

Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev

An­thropic’s up­dated Re­spon­si­ble Scal­ing Policy

Zac Hatfield-DoddsOct 15, 2024, 4:46 PM
38 points
3 comments3 min readLW link
(www.anthropic.com)

An­thropic: Reflec­tions on our Re­spon­si­ble Scal­ing Policy

Zac Hatfield-DoddsMay 20, 2024, 4:14 AM
30 points
21 comments10 min readLW link
(www.anthropic.com)

Sim­ple probes can catch sleeper agents

Apr 23, 2024, 9:10 PM
133 points
21 comments1 min readLW link
(www.anthropic.com)

Third-party test­ing as a key in­gre­di­ent of AI policy

Zac Hatfield-DoddsMar 25, 2024, 10:40 PM
11 points
1 comment12 min readLW link
(www.anthropic.com)

Dario Amodei’s pre­pared re­marks from the UK AI Safety Sum­mit, on An­thropic’s Re­spon­si­ble Scal­ing Policy

Zac Hatfield-DoddsNov 1, 2023, 6:10 PM
83 points
1 comment4 min readLW link
(www.anthropic.com)

Towards Monose­man­tic­ity: De­com­pos­ing Lan­guage Models With Dic­tionary Learning

Zac Hatfield-DoddsOct 5, 2023, 9:01 PM
288 points
22 comments2 min readLW link1 review
(transformer-circuits.pub)

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-DoddsSep 19, 2023, 3:09 PM
85 points
26 comments3 min readLW link1 review
(www.anthropic.com)

An­thropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC
172 points
39 comments2 min readLW link
(www.anthropic.com)

Con­crete Rea­sons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC
94 points
13 comments1 min readLW link

In Defence of Spock

Zac Hatfield-Dodds21 Apr 2021 21:34 UTC
37 points
5 comments1 min readLW link

Zac Hat­field Dodds’s Shortform

Zac Hatfield-Dodds9 Mar 2021 2:39 UTC
2 points
13 commentsLW link