(My un­der­stand­ing of) What Every­one in Tech­ni­cal Align­ment is Do­ing and Why

29 Aug 2022 1:23 UTC
413 points
90 comments37 min readLW link1 review

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

Vika12 Aug 2022 21:06 UTC
395 points
37 comments14 min readLW link1 review

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of Grokking

15 Aug 2022 2:41 UTC
373 points
47 comments36 min readLW link1 review
(colab.research.google.com)

Two-year up­date on my per­sonal AI timelines

Ajeya Cotra2 Aug 2022 23:07 UTC
293 points
60 comments16 min readLW link

Com­mon mis­con­cep­tions about OpenAI

Jacob_Hilton25 Aug 2022 14:02 UTC
251 points
147 comments5 min readLW link1 review

What do ML re­searchers think about AI in 2022?

KatjaGrace4 Aug 2022 15:40 UTC
221 points
33 comments3 min readLW link
(aiimpacts.org)

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworth30 Aug 2022 20:48 UTC
207 points
30 comments10 min readLW link1 review

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworth10 Aug 2022 16:08 UTC
202 points
34 comments3 min readLW link1 review

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

11 Aug 2022 17:45 UTC
182 points
60 comments13 min readLW link1 review

Some con­cep­tual al­ign­ment re­search projects

Richard_Ngo25 Aug 2022 22:51 UTC
176 points
15 comments3 min readLW link

Shard The­ory: An Overview

David Udell11 Aug 2022 5:44 UTC
166 points
34 comments10 min readLW link

Your posts should be on arXiv

JanB25 Aug 2022 10:35 UTC
155 points
44 comments3 min readLW link

Nate Soares’ Life Advice

CatGoddess23 Aug 2022 2:46 UTC
151 points
41 comments3 min readLW link

What’s Gen­eral-Pur­pose Search, And Why Might We Ex­pect To See It In Trained ML Sys­tems?

johnswentworth15 Aug 2022 22:48 UTC
146 points
18 comments10 min readLW link

The Parable of the Boy Who Cried 5% Chance of Wolf

KatWoods15 Aug 2022 14:33 UTC
140 points
24 comments2 min readLW link

How might we al­ign trans­for­ma­tive AI if it’s de­vel­oped very soon?

HoldenKarnofsky29 Aug 2022 15:42 UTC
139 points
55 comments45 min readLW link1 review

In­ter­pretabil­ity/​Tool-ness/​Align­ment/​Cor­rigi­bil­ity are not Composable

johnswentworth8 Aug 2022 18:05 UTC
130 points
12 comments3 min readLW link

Ex­ter­nal­ized rea­son­ing over­sight: a re­search di­rec­tion for lan­guage model alignment

tamera3 Aug 2022 12:03 UTC
130 points
23 comments6 min readLW link

Fiber arts, mys­te­ri­ous do­dec­a­he­drons, and wait­ing on “Eureka!”

eukaryote4 Aug 2022 20:37 UTC
124 points
15 comments9 min readLW link1 review
(eukaryotewritesblog.com)

Tak­ing the pa­ram­e­ters which seem to mat­ter and ro­tat­ing them un­til they don’t

Garrett Baker26 Aug 2022 18:26 UTC
120 points
48 comments1 min readLW link

Med­i­ta­tion course claims 65% en­light­en­ment rate: my review

KatWoods1 Aug 2022 11:25 UTC
111 points
35 comments14 min readLW link

The les­sons of Xanadu

jasoncrawford7 Aug 2022 17:59 UTC
110 points
20 comments8 min readLW link
(jasoncrawford.org)

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_Ngo10 Aug 2022 22:46 UTC
107 points
15 comments27 min readLW link1 review

Beliefs and Disagree­ments about Au­tomat­ing Align­ment Research

Ian McKenzie24 Aug 2022 18:37 UTC
107 points
4 comments7 min readLW link

An­nounc­ing En­cul­tured AI: Build­ing a Video Game

18 Aug 2022 2:16 UTC
103 points
26 comments4 min readLW link

How likely is de­cep­tive al­ign­ment?

evhub30 Aug 2022 19:34 UTC
103 points
28 comments60 min readLW link

Over­sight Misses 100% of Thoughts The AI Does Not Think

johnswentworth12 Aug 2022 16:30 UTC
101 points
49 comments1 min readLW link

ev­ery­thing is okay

Tamsin Leake23 Aug 2022 9:20 UTC
100 points
22 comments7 min readLW link2 reviews

In­tro­duc­ing Past­cast­ing: A tool for fore­cast­ing practice

Sage Future11 Aug 2022 17:38 UTC
95 points
10 comments2 min readLW link2 reviews

Sur­vey advice

KatjaGrace24 Aug 2022 3:10 UTC
93 points
11 comments3 min readLW link
(worldspiritsockpuppet.com)

Rant on Prob­lem Fac­tor­iza­tion for Alignment

johnswentworth5 Aug 2022 19:23 UTC
92 points
53 comments6 min readLW link

How to do the­o­ret­i­cal re­search, a per­sonal perspective

Mark Xu19 Aug 2022 19:41 UTC
91 points
6 comments15 min readLW link

Less Threat-Depen­dent Bar­gain­ing Solu­tions?? (3/​2)

Diffractor20 Aug 2022 2:19 UTC
88 points
7 comments6 min readLW link

[Question] Se­ri­ously, what goes wrong with “re­ward the agent when it makes you smile”?

TurnTrout11 Aug 2022 22:22 UTC
87 points
42 comments2 min readLW link

Refin­ing the Sharp Left Turn threat model, part 1: claims and mechanisms

12 Aug 2022 15:17 UTC
86 points
4 comments3 min readLW link1 review
(vkrakovna.wordpress.com)

High Reli­a­bil­ity Orgs, and AI Companies

Raemon4 Aug 2022 5:45 UTC
86 points
7 comments12 min readLW link1 review

I’m mildly skep­ti­cal that blind­ness pre­vents schizophrenia

Steven Byrnes15 Aug 2022 23:36 UTC
83 points
9 comments4 min readLW link

Most Ivy-smart stu­dents aren’t at Ivy-tier schools

Aaron Bergman7 Aug 2022 3:18 UTC
82 points
7 comments8 min readLW link
(www.aaronbergman.net)

«Boundaries», Part 2: trends in EA’s han­dling of boundaries

Andrew_Critch6 Aug 2022 0:42 UTC
81 points
14 comments7 min readLW link

Hu­man Mimicry Mainly Works When We’re Already Close

johnswentworth17 Aug 2022 18:41 UTC
81 points
16 comments5 min readLW link

Paper is pub­lished! 100,000 lu­mens to treat sea­sonal af­fec­tive disorder

Fabienne20 Aug 2022 19:48 UTC
81 points
3 comments1 min readLW link
(www.lesswrong.com)

What’s the Least Im­pres­sive Thing GPT-4 Won’t be Able to Do

Algon20 Aug 2022 19:48 UTC
80 points
125 comments1 min readLW link

The Loire Is Not Dry

jefftk20 Aug 2022 13:40 UTC
80 points
2 comments1 min readLW link
(www.jefftk.com)

Evolu­tion is a bad anal­ogy for AGI: in­ner alignment

Quintin Pope13 Aug 2022 22:15 UTC
79 points
15 comments8 min readLW link

How (not) to choose a re­search project

9 Aug 2022 0:26 UTC
79 points
11 comments7 min readLW link

AI strat­egy nearcasting

HoldenKarnofsky25 Aug 2022 17:26 UTC
79 points
4 comments9 min readLW link

The Core of the Align­ment Prob­lem is...

17 Aug 2022 20:07 UTC
76 points
10 comments9 min readLW link

An­nounc­ing the In­tro­duc­tion to ML Safety course

6 Aug 2022 2:46 UTC
73 points
6 comments7 min readLW link

Dis­cov­er­ing Agents

zac_kenton18 Aug 2022 17:33 UTC
73 points
11 comments6 min readLW link

[Question] COVID-19 Group Test­ing Post-mortem?

gwern5 Aug 2022 16:32 UTC
72 points
6 comments2 min readLW link