RSS

David Scott Krueger (formerly: capybaralet)

Karma: 2,195

I’m more active on Twitter than LW/​AF these days: https://​​twitter.com/​​DavidSKrueger

https://​​www.davidscottkrueger.com/​​

A re­view of “Why Did En­vi­ron­men­tal­ism Be­come Par­ti­san?”

David Scott Krueger (formerly: capybaralet)Apr 25, 2025, 5:12 AM
15 points
0 comments4 min readLW link

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

Jan 30, 2025, 5:03 PM
162 points
52 comments2 min readLW link
(gradual-disempowerment.ai)

A Sober Look at Steer­ing Vec­tors for LLMs

Nov 23, 2024, 5:30 PM
38 points
0 comments5 min readLW link

[Question] Is there any rigor­ous work on us­ing an­thropic un­cer­tainty to pre­vent situ­a­tional aware­ness /​ de­cep­tion?

David Scott Krueger (formerly: capybaralet)Sep 4, 2024, 12:40 PM
19 points
7 comments1 min readLW link

An ML pa­per on data steal­ing pro­vides a con­struc­tion for “gra­di­ent hack­ing”

David Scott Krueger (formerly: capybaralet)Jul 30, 2024, 9:44 PM
21 points
1 comment1 min readLW link
(arxiv.org)