Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Nicholas Goldowsky-Dill
Karma:
636
Interpretability Researcher at Apollo Research
All
Posts
Comments
New
Top
Old
Nicholas Goldowsky-Dill’s Shortform
Nicholas Goldowsky-Dill
6 Nov 2024 12:37 UTC
5
points
2
comments
1
min read
LW
link
A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey
,
Lucius Bushnaq
,
Dan Braun
,
StefanHex
and
Nicholas Goldowsky-Dill
18 Jul 2024 14:15 UTC
117
points
18
comments
18
min read
LW
link
Apollo Research 1-year update
Marius Hobbhahn
,
Lee Sharkey
,
Lucius Bushnaq
,
Dan Braun
,
Mikita Balesni
,
Jérémy Scheurer
,
Nicholas Goldowsky-Dill
,
StefanHex
,
jake_mendel
,
AlexMeinke
and
rusheb
29 May 2024 17:44 UTC
93
points
0
comments
7
min read
LW
link
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq
,
jake_mendel
,
Dan Braun
,
StefanHex
,
Nicholas Goldowsky-Dill
,
Kaarel
,
Avery
,
Joern Stoehler
,
debrevitatevitae
,
Magdalena Wache
and
Marius Hobbhahn
20 May 2024 17:53 UTC
105
points
4
comments
3
min read
LW
link
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun
,
Jordan Taylor
,
Nicholas Goldowsky-Dill
and
Lee Sharkey
17 May 2024 16:25 UTC
57
points
10
comments
4
min read
LW
link
(arxiv.org)
Causal scrubbing: results on induction heads
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:59 UTC
34
points
1
comment
17
min read
LW
link
Causal scrubbing: results on a paren balance checker
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:59 UTC
34
points
2
comments
30
min read
LW
link
Causal scrubbing: Appendix
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:58 UTC
18
points
4
comments
20
min read
LW
link
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:58 UTC
205
points
35
comments
20
min read
LW
link
1
review
Back to top