RSS

Julian Stastny

Karma: 174

associate member of technical staff @ redwood research

Misal­ign­ment and Strate­gic Un­der­perfor­mance: An Anal­y­sis of Sand­bag­ging and Ex­plo­ra­tion Hacking

May 8, 2025, 7:06 PM
75 points
1 comment15 min readLW link

7+ tractable di­rec­tions in AI control

Apr 28, 2025, 5:12 PM
83 points
1 comment13 min readLW link

Disen­tan­gling four mo­ti­va­tions for act­ing in ac­cor­dance with UDT

Julian StastnyNov 5, 2023, 9:26 PM
35 points
3 comments7 min readLW link