RSS

David Duvenaud

Karma: 680

My website is https://​​www.cs.toronto.edu/​​~duvenaud/​​

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

30 Jan 2025 17:03 UTC
159 points
52 comments2 min readLW link
(gradual-disempowerment.ai)

Sab­o­tage Eval­u­a­tions for Fron­tier Models

18 Oct 2024 22:33 UTC
94 points
56 comments6 min readLW link
(assets.anthropic.com)

Sim­ple probes can catch sleeper agents

23 Apr 2024 21:10 UTC
133 points
21 comments1 min readLW link
(www.anthropic.com)

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

12 Jan 2024 19:51 UTC
305 points
95 comments3 min readLW link
(arxiv.org)