RSS

Alex Mallen

Karma: 343

Redwood Research

Poli­ti­cal syco­phancy as a model or­ganism of scheming

May 12, 2025, 5:49 PM
39 points
0 comments14 min readLW link

Train­ing-time schemers vs be­hav­ioral schemers

Alex MallenApr 24, 2025, 7:07 PM
36 points
2 comments6 min readLW link

Sub­ver­sion Strat­egy Eval: Can lan­guage mod­els state­lessly strate­gize to sub­vert con­trol pro­to­cols?

Mar 24, 2025, 5:55 PM
34 points
0 comments8 min readLW link

Mea­sur­ing whether AIs can state­lessly strate­gize to sub­vert se­cu­rity measures

Dec 19, 2024, 9:25 PM
62 points
0 comments11 min readLW link

Balanc­ing La­bel Quan­tity and Qual­ity for Scal­able Elicitation

Alex MallenOct 24, 2024, 4:49 PM
31 points
1 comment2 min readLW link

A quick ex­per­i­ment on LMs’ in­duc­tive bi­ases in perform­ing search

Alex MallenApr 14, 2024, 3:41 AM
32 points
2 comments4 min readLW link