RSS

David Lindner

Karma: 448

Alignment researcher at Google DeepMind

MONA: Three Month Later—Up­dates and Steganog­ra­phy Without Op­ti­miza­tion Pressure

Apr 12, 2025, 11:15 PM
28 points
0 comments5 min readLW link

Can LLMs learn Stegano­graphic Rea­son­ing via RL?

Apr 11, 2025, 4:33 PM
19 points
1 comment6 min readLW link

MONA: Man­aged My­opia with Ap­proval Feedback

Jan 23, 2025, 12:24 PM
80 points
29 comments9 min readLW link

On scal­able over­sight with weak LLMs judg­ing strong LLMs

Jul 8, 2024, 8:59 AM
49 points
18 comments7 min readLW link
(arxiv.org)

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

Oct 23, 2023, 2:11 PM
20 points
2 comments5 min readLW link
(far.ai)

Prac­ti­cal Pit­falls of Causal Scrubbing

Mar 27, 2023, 7:47 AM
87 points
17 comments13 min readLW link

Threat Model Liter­a­ture Review

Nov 1, 2022, 11:03 AM
78 points
4 comments25 min readLW link

Clar­ify­ing AI X-risk

Nov 1, 2022, 11:03 AM
127 points
24 comments4 min readLW link1 review