RSS

dmz

Karma: 494

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
138 points
15 comments13 min readLW link

Take­aways from our ro­bust in­jury clas­sifier pro­ject [Red­wood Re­search]

dmzSep 17, 2022, 3:55 AM
143 points
12 comments6 min readLW link1 review

High-stakes al­ign­ment via ad­ver­sar­ial train­ing [Red­wood Re­search re­port]

5 May 2022 0:59 UTC
142 points
29 comments9 min readLW link

Some crite­ria for sand­wich­ing projects

dmz12 Aug 2021 3:40 UTC
18 points
1 comment4 min readLW link

DMZ’s Shortform

dmz9 Aug 2021 1:18 UTC
1 point
1 commentLW link