Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Nate Thomas
Karma:
513
Redwood Research and Constellation
All
Posts
Comments
New
Top
Old
Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter
Nate Thomas
26 Oct 2023 3:07 UTC
42
points
10
comments
1
min read
LW
link
Causal scrubbing: results on induction heads
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:59 UTC
34
points
1
comment
17
min read
LW
link
Causal scrubbing: results on a paren balance checker
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:59 UTC
34
points
2
comments
30
min read
LW
link
Causal scrubbing: Appendix
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:58 UTC
18
points
4
comments
20
min read
LW
link
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
3 Dec 2022 0:58 UTC
205
points
35
comments
20
min read
LW
link
1
review
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
,
Xander Davies
,
Buck
and
Nate Thomas
27 Oct 2022 1:32 UTC
135
points
14
comments
12
min read
LW
link
High-stakes alignment via adversarial training [Redwood Research report]
dmz
,
LawrenceC
and
Nate Thomas
5 May 2022 0:59 UTC
142
points
29
comments
9
min read
LW
link
We’re Redwood Research, we do applied alignment research, AMA
Nate Thomas
6 Oct 2021 5:51 UTC
56
points
2
comments
2
min read
LW
link
(forum.effectivealtruism.org)
Back to top