RSS

Fabien Roger

Karma: 5,262

Au­to­mated Re­searchers Can Subtly Sandbag

Mar 26, 2025, 7:13 PM
44 points
0 comments4 min readLW link
(alignment.anthropic.com)

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
141 points
15 comments13 min readLW link