RSS

Alex Mallen

Karma: 259

Redwood Research

Sub­ver­sion Strat­egy Eval: Can lan­guage mod­els state­lessly strate­gize to sub­vert con­trol pro­to­cols?

Mar 24, 2025, 5:55 PM
30 points
0 comments8 min readLW link

Mea­sur­ing whether AIs can state­lessly strate­gize to sub­vert se­cu­rity measures

Dec 19, 2024, 9:25 PM
61 points
0 comments11 min readLW link

Balanc­ing La­bel Quan­tity and Qual­ity for Scal­able Elicitation

Alex MallenOct 24, 2024, 4:49 PM
31 points
1 comment2 min readLW link

A quick ex­per­i­ment on LMs’ in­duc­tive bi­ases in perform­ing search

Alex MallenApr 14, 2024, 3:41 AM
32 points
2 comments4 min readLW link