Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
lewis smith
Karma:
519
All
Posts
Comments
New
Top
Old
lewis smith’s Shortform
lewis smith
30 Aug 2024 9:51 UTC
12
points
7
comments
1
min read
LW
link
The ‘strong’ feature hypothesis could be wrong
lewis smith
2 Aug 2024 14:33 UTC
216
points
17
comments
17
min read
LW
link
Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
lewis smith
,
Tom Lieberum
,
Vikrant Varma
,
János Kramár
,
Rohin Shah
and
Neel Nanda
25 Apr 2024 18:43 UTC
63
points
38
comments
1
min read
LW
link
(arxiv.org)
[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
and
Vikrant Varma
19 Apr 2024 19:06 UTC
73
points
10
comments
8
min read
LW
link
[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
and
Vikrant Varma
19 Apr 2024 19:06 UTC
68
points
0
comments
3
min read
LW
link
Dropout can create a privileged basis in the ReLU output model.
lewis smith
28 Apr 2023 1:59 UTC
24
points
3
comments
5
min read
LW
link
Back to top