Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Alex Mallen
Karma:
343
Redwood Research
All
Posts
Comments
New
Top
Old
Political sycophancy as a model organism of scheming
Alex Mallen
and
Vivek Hebbar
May 12, 2025, 5:49 PM
39
points
0
comments
14
min read
LW
link
Training-time schemers vs behavioral schemers
Alex Mallen
Apr 24, 2025, 7:07 PM
36
points
2
comments
6
min read
LW
link
Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
Alex Mallen
,
charlie_griffin
and
Buck
Mar 24, 2025, 5:55 PM
34
points
0
comments
8
min read
LW
link
Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen
and
Buck
Dec 19, 2024, 9:25 PM
62
points
0
comments
11
min read
LW
link
Balancing Label Quantity and Quality for Scalable Elicitation
Alex Mallen
Oct 24, 2024, 4:49 PM
31
points
1
comment
2
min read
LW
link
A quick experiment on LMs’ inductive biases in performing search
Alex Mallen
Apr 14, 2024, 3:41 AM
32
points
2
comments
4
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel