RSS

Zac Hatfield-Dodds

Karma: 2,746

Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev

An­thropic’s up­dated Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds15 Oct 2024 16:46 UTC
51 points
3 comments3 min readLW link
(www.anthropic.com)

An­thropic: Reflec­tions on our Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC
30 points
21 comments10 min readLW link
(www.anthropic.com)

Sim­ple probes can catch sleeper agents

23 Apr 2024 21:10 UTC
133 points
21 comments1 min readLW link
(www.anthropic.com)

Third-party test­ing as a key in­gre­di­ent of AI policy

Zac Hatfield-Dodds25 Mar 2024 22:40 UTC
11 points
1 comment12 min readLW link
(www.anthropic.com)

Dario Amodei’s pre­pared re­marks from the UK AI Safety Sum­mit, on An­thropic’s Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC
85 points
1 comment4 min readLW link
(www.anthropic.com)

Towards Monose­man­tic­ity: De­com­pos­ing Lan­guage Models With Dic­tionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC
287 points
21 comments2 min readLW link
(transformer-circuits.pub)

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC
83 points
23 comments3 min readLW link
(www.anthropic.com)

An­thropic’s Core Views on AI Safety

Zac Hatfield-Dodds9 Mar 2023 16:55 UTC
172 points
39 comments2 min readLW link
(www.anthropic.com)

Con­crete Rea­sons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC
100 points
13 comments1 min readLW link

In Defence of Spock

Zac Hatfield-Dodds21 Apr 2021 21:34 UTC
36 points
5 comments1 min readLW link

Zac Hat­field Dodds’s Shortform

Zac Hatfield-Dodds9 Mar 2021 2:39 UTC
2 points
3 comments1 min readLW link