Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Neel Nanda
Karma:
10,114
All
Posts
Comments
New
Top
Old
Page
1
My Research Process: Key Mindsets—Truth-Seeking, Prioritisation, Moving Fast
Neel Nanda
Apr 27, 2025, 2:38 PM
9
points
0
comments
11
min read
LW
link
How I Think About My Research Process: Explore, Understand, Distill
Neel Nanda
Apr 26, 2025, 10:31 AM
35
points
4
comments
8
min read
LW
link
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith
,
Senthooran Rajamanoharan
,
Arthur Conmy
,
CallumMcDougall
,
Tom Lieberum
,
János Kramár
,
Rohin Shah
and
Neel Nanda
Mar 26, 2025, 7:07 PM
109
points
15
comments
29
min read
LW
link
(deepmindsafetyresearch.medium.com)
Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda
Mar 22, 2025, 10:13 AM
291
points
28
comments
4
min read
LW
link
(www.neelnanda.io)
Takeaways From Our Recent Work on SAE Probing
Josh Engels
,
Subhash Kantamneni
,
Senthooran Rajamanoharan
and
Neel Nanda
Mar 3, 2025, 7:50 PM
30
points
0
comments
5
min read
LW
link
The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research
Arthur Conmy
and
Neel Nanda
Feb 24, 2025, 2:17 AM
48
points
1
comment
7
min read
LW
link
MATS Applications + Research Directions I’m Currently Excited About
Neel Nanda
Feb 6, 2025, 11:03 AM
73
points
7
comments
8
min read
LW
link
Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann
,
Patrick Leask
and
Neel Nanda
Dec 19, 2024, 3:59 PM
42
points
6
comments
11
min read
LW
link
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
Can
,
Adam Karvonen
,
Johnny Lin
,
Curt Tigges
,
Joseph Bloom
,
chanind
,
Yeu-Tong Lau
,
Eoin Farrell
,
Arthur Conmy
,
CallumMcDougall
,
Kola Ayonrinde
,
Matthew Wearden
,
Sam Marks
and
Neel Nanda
Dec 11, 2024, 6:30 AM
82
points
6
comments
2
min read
LW
link
(www.neuronpedia.org)
Evolutionary prompt optimization for SAE feature visualization
neverix
,
Daniel Tan
,
Dmitrii Kharlapenko
,
Neel Nanda
and
Arthur Conmy
Nov 14, 2024, 1:06 PM
21
points
0
comments
9
min read
LW
link
SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane
,
robertzk
,
Neel Nanda
and
Arthur Conmy
Nov 7, 2024, 5:22 AM
66
points
4
comments
14
min read
LW
link
SAE Probing: What is it good for?
Subhash Kantamneni
,
Josh Engels
,
Senthooran Rajamanoharan
and
Neel Nanda
Nov 1, 2024, 7:23 PM
33
points
0
comments
11
min read
LW
link
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Oct 27, 2024, 6:46 PM
47
points
4
comments
5
min read
LW
link
SAE features for refusal and sycophancy steering vectors
neverix
,
Dmitrii Kharlapenko
,
Arthur Conmy
and
Neel Nanda
Oct 12, 2024, 2:54 PM
29
points
4
comments
7
min read
LW
link
Base LLMs refuse too
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Sep 29, 2024, 4:04 PM
60
points
20
comments
10
min read
LW
link
Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann
,
Michael Pearce
,
Patrick Leask
,
Joseph Bloom
,
Lee Sharkey
and
Neel Nanda
Aug 24, 2024, 12:56 AM
68
points
10
comments
20
min read
LW
link
Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask
,
Bart Bussmann
and
Neel Nanda
Aug 17, 2024, 1:16 AM
53
points
0
comments
5
min read
LW
link
Extracting SAE task features for in-context learning
Dmitrii Kharlapenko
,
neverix
,
Neel Nanda
and
Arthur Conmy
Aug 12, 2024, 8:34 PM
31
points
1
comment
9
min read
LW
link
Self-explaining SAE features
Dmitrii Kharlapenko
,
neverix
,
Neel Nanda
and
Arthur Conmy
Aug 5, 2024, 10:20 PM
60
points
13
comments
10
min read
LW
link
BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann
,
Patrick Leask
and
Neel Nanda
Jul 20, 2024, 2:20 AM
59
points
0
comments
4
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel