RSS

LawrenceC

Karma: 5,366

I do AI Alignment research. Currently at METR, but previously at: Redwood Research, UC Berkeley, Good Judgment Project.

I’m also a part-time fund manager for the LTFF.

Obligatory research billboard website: https://​​chanlawrence.me/​​

OpenAI/​Microsoft an­nounce “next gen­er­a­tion lan­guage model” in­te­grated into Bing/​Edge

LawrenceCFeb 7, 2023, 8:38 PM
79 points
4 comments1 min readLW link
(blogs.microsoft.com)

Eval­u­a­tions (of new AI Safety re­searchers) can be noisy

LawrenceCFeb 5, 2023, 4:15 AM
132 points
11 comments16 min readLW link1 review

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

Jan 10, 2023, 4:06 PM
84 points
8 comments39 min readLW link
(arxiv.org)

Paper: Su­per­po­si­tion, Me­moriza­tion, and Dou­ble Des­cent (An­thropic)

LawrenceCJan 5, 2023, 5:54 PM
53 points
11 comments1 min readLW link
(transformer-circuits.pub)

Touch re­al­ity as soon as pos­si­ble (when do­ing ma­chine learn­ing re­search)

LawrenceCJan 3, 2023, 7:11 PM
117 points
9 comments8 min readLW link1 review

Shard The­ory in Nine Th­e­ses: a Distil­la­tion and Crit­i­cal Appraisal

LawrenceCDec 19, 2022, 10:52 PM
150 points
30 comments18 min readLW link

Paper: Con­sti­tu­tional AI: Harm­less­ness from AI Feed­back (An­thropic)

LawrenceCDec 16, 2022, 10:12 PM
68 points
11 comments1 min readLW link
(www.anthropic.com)

Paper: Trans­form­ers learn in-con­text by gra­di­ent descent

LawrenceCDec 16, 2022, 11:10 AM
28 points
11 comments2 min readLW link
(arxiv.org)

Causal scrub­bing: re­sults on in­duc­tion heads

Dec 3, 2022, 12:59 AM
34 points
1 comment17 min readLW link

Causal scrub­bing: re­sults on a paren bal­ance checker

Dec 3, 2022, 12:59 AM
34 points
2 comments30 min readLW link

Causal scrub­bing: Appendix

Dec 3, 2022, 12:58 AM
18 points
4 comments20 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

Dec 3, 2022, 12:58 AM
206 points
35 comments20 min readLW link1 review

Paper: In-con­text Re­in­force­ment Learn­ing with Al­gorithm Distil­la­tion [Deep­mind]

LawrenceCOct 26, 2022, 6:45 PM
29 points
5 comments1 min readLW link
(arxiv.org)

LawrenceC’s Shortform

LawrenceCOct 8, 2022, 5:17 PM
5 points
18 commentsLW link

Paper: Dis­cov­er­ing novel al­gorithms with AlphaTen­sor [Deep­mind]

LawrenceCOct 5, 2022, 4:20 PM
82 points
18 comments1 min readLW link
(www.deepmind.com)

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

Aug 11, 2022, 5:45 PM
182 points
60 comments13 min readLW link1 review

High-stakes al­ign­ment via ad­ver­sar­ial train­ing [Red­wood Re­search re­port]

May 5, 2022, 12:59 AM
142 points
29 comments9 min readLW link

Book Re­view: Discrete Math­e­mat­ics and Its Ap­pli­ca­tions (MIRI Course List)

LawrenceCApr 14, 2015, 9:08 AM
21 points
12 comments8 min readLW link