RSS

David Duvenaud

Karma: 671

My website is https://​​www.cs.toronto.edu/​​~duvenaud/​​

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

Jan 30, 2025, 5:03 PM
159 points
52 comments2 min readLW link
(gradual-disempowerment.ai)

Sab­o­tage Eval­u­a­tions for Fron­tier Models

Oct 18, 2024, 10:33 PM
94 points
56 comments6 min readLW link
(assets.anthropic.com)

Sim­ple probes can catch sleeper agents

Apr 23, 2024, 9:10 PM
133 points
21 comments1 min readLW link
(www.anthropic.com)

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

Jan 12, 2024, 7:51 PM
305 points
95 comments3 min readLW link
(arxiv.org)