Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Can
Karma:
257
All
Posts
Comments
New
Top
Old
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
Can
,
Adam Karvonen
,
Johnny Lin
,
Curt Tigges
,
Joseph Bloom
,
chanind
,
Yeu-Tong Lau
,
Eoin Farrell
,
Arthur Conmy
,
CallumMcDougall
,
Kola Ayonrinde
,
Matthew Wearden
,
Sam Marks
and
Neel Nanda
Dec 11, 2024, 6:30 AM
82
points
6
comments
2
min read
LW
link
(www.neuronpedia.org)
Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen
,
Sam Marks
,
Can
,
Benjamin Wright
,
Jannik Brinkmann
,
Logan Riggs
and
Rico Angell
2 Aug 2024 19:50 UTC
38
points
1
comment
9
min read
LW
link
OthelloGPT learned a bag of heuristics
jylin04
,
JackS
,
Adam Karvonen
and
Can
2 Jul 2024 9:12 UTC
111
points
10
comments
9
min read
LW
link
Past Tense Features
Can
20 Apr 2024 14:34 UTC
12
points
0
comments
4
min read
LW
link
An adversarial example for Direct Logit Attribution: memory management in gelu-4l
Can
,
Yeu-Tong Lau
,
James Dao
and
Jett Janiak
30 Aug 2023 17:36 UTC
17
points
0
comments
8
min read
LW
link
(arxiv.org)
Understanding mesa-optimization using toy models
tilmanr
,
rusheb
,
Guillaume Corlouer
,
Dan Valentine
,
afspies
,
mivanitskiy
and
Can
7 May 2023 17:00 UTC
43
points
2
comments
10
min read
LW
link
Safety of Self-Assembled Neuromorphic Hardware
Can
26 Dec 2022 18:51 UTC
16
points
2
comments
10
min read
LW
link
(forum.effectivealtruism.org)
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel