(My un­der­stand­ing of) What Every­one in Tech­ni­cal Align­ment is Do­ing and Why

Aug 29, 2022, 1:23 AM
413 points
90 comments37 min readLW link1 review

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

VikaAug 12, 2022, 9:06 PM
395 points
37 comments14 min readLW link1 review

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of Grokking

Aug 15, 2022, 2:41 AM
373 points
48 comments36 min readLW link1 review
(colab.research.google.com)

Two-year up­date on my per­sonal AI timelines

Ajeya CotraAug 2, 2022, 11:07 PM
293 points
60 comments16 min readLW link

Com­mon mis­con­cep­tions about OpenAI

Jacob_HiltonAug 25, 2022, 2:02 PM
237 points
154 comments5 min readLW link1 review

What do ML re­searchers think about AI in 2022?

KatjaGraceAug 4, 2022, 3:40 PM
221 points
33 comments3 min readLW link
(aiimpacts.org)

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworthAug 10, 2022, 4:08 PM
209 points
34 comments3 min readLW link1 review

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworthAug 30, 2022, 8:48 PM
208 points
30 comments10 min readLW link1 review

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

Aug 11, 2022, 5:45 PM
182 points
60 comments13 min readLW link1 review

Some con­cep­tual al­ign­ment re­search projects

Richard_NgoAug 25, 2022, 10:51 PM
177 points
15 comments3 min readLW link

Shard The­ory: An Overview

David UdellAug 11, 2022, 5:44 AM
166 points
34 comments10 min readLW link

What’s Gen­eral-Pur­pose Search, And Why Might We Ex­pect To See It In Trained ML Sys­tems?

johnswentworthAug 15, 2022, 10:48 PM
156 points
18 comments10 min readLW link

Your posts should be on arXiv

JanBAug 25, 2022, 10:35 AM
156 points
44 comments3 min readLW link

Nate Soares’ Life Advice

CatGoddessAug 23, 2022, 2:46 AM
153 points
41 comments3 min readLW link

In­ter­pretabil­ity/​Tool-ness/​Align­ment/​Cor­rigi­bil­ity are not Composable

johnswentworthAug 8, 2022, 6:05 PM
143 points
13 comments3 min readLW link

How might we al­ign trans­for­ma­tive AI if it’s de­vel­oped very soon?

HoldenKarnofskyAug 29, 2022, 3:42 PM
140 points
55 comments45 min readLW link1 review

The Parable of the Boy Who Cried 5% Chance of Wolf

KatWoodsAug 15, 2022, 2:33 PM
140 points
24 comments2 min readLW link

Ex­ter­nal­ized rea­son­ing over­sight: a re­search di­rec­tion for lan­guage model alignment

tameraAug 3, 2022, 12:03 PM
135 points
23 comments6 min readLW link

Fiber arts, mys­te­ri­ous do­dec­a­he­drons, and wait­ing on “Eureka!”

eukaryoteAug 4, 2022, 8:37 PM
124 points
15 comments9 min readLW link1 review
(eukaryotewritesblog.com)

Tak­ing the pa­ram­e­ters which seem to mat­ter and ro­tat­ing them un­til they don’t

Garrett BakerAug 26, 2022, 6:26 PM
120 points
48 comments1 min readLW link

Med­i­ta­tion course claims 65% en­light­en­ment rate: my review

KatWoodsAug 1, 2022, 11:25 AM
111 points
35 comments14 min readLW link

Over­sight Misses 100% of Thoughts The AI Does Not Think

johnswentworthAug 12, 2022, 4:30 PM
110 points
49 comments1 min readLW link

The les­sons of Xanadu

jasoncrawfordAug 7, 2022, 5:59 PM
110 points
20 comments8 min readLW link
(jasoncrawford.org)

Beliefs and Disagree­ments about Au­tomat­ing Align­ment Research

Ian McKenzieAug 24, 2022, 6:37 PM
107 points
4 comments7 min readLW link

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_NgoAug 10, 2022, 10:46 PM
107 points
15 comments27 min readLW link1 review

How likely is de­cep­tive al­ign­ment?

evhubAug 30, 2022, 7:34 PM
104 points
28 comments60 min readLW link

An­nounc­ing En­cul­tured AI: Build­ing a Video Game

Aug 18, 2022, 2:16 AM
103 points
26 comments4 min readLW link

Rant on Prob­lem Fac­tor­iza­tion for Alignment

johnswentworthAug 5, 2022, 7:23 PM
102 points
53 comments6 min readLW link

In­tro­duc­ing Past­cast­ing: A tool for fore­cast­ing practice

Sage FutureAug 11, 2022, 5:38 PM
95 points
10 comments2 min readLW link2 reviews

Sur­vey advice

KatjaGraceAug 24, 2022, 3:10 AM
93 points
11 comments3 min readLW link
(worldspiritsockpuppet.com)

How to do the­o­ret­i­cal re­search, a per­sonal perspective

Mark XuAug 19, 2022, 7:41 PM
91 points
6 comments15 min readLW link

Less Threat-Depen­dent Bar­gain­ing Solu­tions?? (3/​2)

DiffractorAug 20, 2022, 2:19 AM
88 points
7 comments6 min readLW link

[Question] Se­ri­ously, what goes wrong with “re­ward the agent when it makes you smile”?

TurnTroutAug 11, 2022, 10:22 PM
87 points
43 comments2 min readLW link

High Reli­a­bil­ity Orgs, and AI Companies

RaemonAug 4, 2022, 5:45 AM
86 points
7 comments12 min readLW link1 review

Refin­ing the Sharp Left Turn threat model, part 1: claims and mechanisms

Aug 12, 2022, 3:17 PM
86 points
4 comments3 min readLW link1 review
(vkrakovna.wordpress.com)

I’m mildly skep­ti­cal that blind­ness pre­vents schizophrenia

Steven ByrnesAug 15, 2022, 11:36 PM
83 points
9 comments4 min readLW link

Most Ivy-smart stu­dents aren’t at Ivy-tier schools

Aaron BergmanAug 7, 2022, 3:18 AM
82 points
7 comments8 min readLW link
(www.aaronbergman.net)

Hu­man Mimicry Mainly Works When We’re Already Close

johnswentworthAug 17, 2022, 6:41 PM
81 points
16 comments5 min readLW link

«Boundaries», Part 2: trends in EA’s han­dling of boundaries

Andrew_CritchAug 6, 2022, 12:42 AM
81 points
15 comments7 min readLW link

Paper is pub­lished! 100,000 lu­mens to treat sea­sonal af­fec­tive disorder

FabienneAug 20, 2022, 7:48 PM
81 points
3 comments1 min readLW link
(www.lesswrong.com)

The Loire Is Not Dry

jefftkAug 20, 2022, 1:40 PM
80 points
2 comments1 min readLW link
(www.jefftk.com)

What’s the Least Im­pres­sive Thing GPT-4 Won’t be Able to Do

AlgonAug 20, 2022, 7:48 PM
80 points
125 comments1 min readLW link

Evolu­tion is a bad anal­ogy for AGI: in­ner alignment

Quintin PopeAug 13, 2022, 10:15 PM
79 points
15 comments8 min readLW link

How (not) to choose a re­search project

Aug 9, 2022, 12:26 AM
79 points
11 comments7 min readLW link

AI strat­egy nearcasting

HoldenKarnofskyAug 25, 2022, 5:26 PM
79 points
4 comments9 min readLW link

The Core of the Align­ment Prob­lem is...

Aug 17, 2022, 8:07 PM
76 points
10 comments9 min readLW link

An­nounc­ing the In­tro­duc­tion to ML Safety course

Aug 6, 2022, 2:46 AM
73 points
6 comments7 min readLW link

Dis­cov­er­ing Agents

zac_kentonAug 18, 2022, 5:33 PM
73 points
11 comments6 min readLW link

[Question] COVID-19 Group Test­ing Post-mortem?

gwernAug 5, 2022, 4:32 PM
72 points
6 comments2 min readLW link

$20K In Boun­ties for AI Safety Public Materials

Aug 5, 2022, 2:52 AM
71 points
9 comments6 min readLW link