RSS

Nate Thomas

Karma: 513

Redwood Research and Constellation

Ap­ply to the Con­stel­la­tion Visit­ing Re­searcher Pro­gram and As­tra Fel­low­ship, in Berkeley this Winter

Nate Thomas26 Oct 2023 3:07 UTC
42 points
10 comments1 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

3 Dec 2022 0:59 UTC
34 points
1 comment17 min readLW link

Causal scrub­bing: re­sults on a paren bal­ance checker

3 Dec 2022 0:59 UTC
34 points
2 comments30 min readLW link

Causal scrub­bing: Appendix

3 Dec 2022 0:58 UTC
18 points
4 comments20 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

3 Dec 2022 0:58 UTC
205 points
35 comments20 min readLW link1 review

Ap­ply to the Red­wood Re­search Mechanis­tic In­ter­pretabil­ity Ex­per­i­ment (REMIX), a re­search pro­gram in Berkeley

27 Oct 2022 1:32 UTC
135 points
14 comments12 min readLW link

High-stakes al­ign­ment via ad­ver­sar­ial train­ing [Red­wood Re­search re­port]

5 May 2022 0:59 UTC
142 points
29 comments9 min readLW link

We’re Red­wood Re­search, we do ap­plied al­ign­ment re­search, AMA

Nate Thomas6 Oct 2021 5:51 UTC
56 points
2 comments2 min readLW link
(forum.effectivealtruism.org)