Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
cloud
Karma:
424
All
Posts
Comments
New
Top
Old
Distillation Robustifies Unlearning
Bruce W. Lee
,
Addie Foote
,
alexinf
,
leni
,
Jacob G-W
,
Harish Kamath
,
Bryce Woodworth
,
cloud
and
TurnTrout
Jun 13, 2025, 1:45 PM
220
points
11
comments
8
min read
LW
link
(arxiv.org)
Selective modularity: a research agenda
cloud
and
Jacob G-W
Mar 24, 2025, 4:12 AM
66
points
2
comments
24
min read
LW
link
[Question]
Is weak-to-strong generalization an alignment technique?
cloud
Jan 31, 2025, 7:13 AM
22
points
1
comment
2
min read
LW
link
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud
,
Jacob G-W
,
Evzen
,
Joseph Miller
and
TurnTrout
Dec 6, 2024, 10:19 PM
165
points
12
comments
11
min read
LW
link
(arxiv.org)
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel