Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Dmitrii Kharlapenko
Karma:
103
All
Posts
Comments
New
Top
Old
Evolutionary prompt optimization for SAE feature visualization
neverix
,
Daniel Tan
,
Dmitrii Kharlapenko
,
Neel Nanda
and
Arthur Conmy
14 Nov 2024 13:06 UTC
16
points
0
comments
9
min read
LW
link
SAE features for refusal and sycophancy steering vectors
neverix
,
Dmitrii Kharlapenko
,
Arthur Conmy
and
Neel Nanda
12 Oct 2024 14:54 UTC
26
points
4
comments
7
min read
LW
link
Extracting SAE task features for in-context learning
Dmitrii Kharlapenko
,
neverix
,
Neel Nanda
and
Arthur Conmy
12 Aug 2024 20:34 UTC
31
points
1
comment
9
min read
LW
link
Self-explaining SAE features
Dmitrii Kharlapenko
,
neverix
,
Neel Nanda
and
Arthur Conmy
5 Aug 2024 22:20 UTC
60
points
13
comments
10
min read
LW
link
Back to top