Mys­ter­ies of mode collapse

janusNov 8, 2022, 10:37 AM
284 points
57 comments14 min readLW link1 review

What it’s like to dis­sect a cadaver

Alok SinghNov 10, 2022, 6:40 AM
208 points
24 comments5 min readLW link
(alok.github.io)

I Con­verted Book I of The Se­quences Into A Zoomer-Read­able Format

dkirmaniNov 10, 2022, 2:59 AM
200 points
32 comments2 min readLW link

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

Nov 28, 2022, 12:54 PM
199 points
33 comments31 min readLW link

Tyranny of the Epistemic Majority

Scott GarrabrantNov 22, 2022, 5:19 PM
192 points
13 comments9 min readLW link1 review

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

Nov 23, 2022, 5:10 PM
180 points
9 comments8 min readLW link

Geo­met­ric Ra­tion­al­ity is Not VNM Rational

Scott GarrabrantNov 27, 2022, 7:36 PM
176 points
27 comments3 min readLW link

Planes are still decades away from dis­plac­ing most bird jobs

guzeyNov 25, 2022, 4:49 PM
168 points
13 comments3 min readLW link

The Geo­met­ric Expectation

Scott GarrabrantNov 23, 2022, 6:05 PM
159 points
22 comments4 min readLW link

Mechanis­tic anomaly de­tec­tion and ELK

paulfchristianoNov 25, 2022, 6:50 PM
138 points
22 comments21 min readLW link
(ai-alignment.com)

The Align­ment Com­mu­nity Is Cul­turally Broken

sudoNov 13, 2022, 6:53 PM
136 points
68 comments2 min readLW link

AI will change the world, but won’t take it over by play­ing “3-di­men­sional chess”.

Nov 22, 2022, 6:57 PM
134 points
97 comments24 min readLW link

Sadly, FTX

ZviNov 17, 2022, 2:30 PM
133 points
18 comments47 min readLW link
(thezvi.wordpress.com)

Clar­ify­ing AI X-risk

Nov 1, 2022, 11:03 AM
127 points
24 comments4 min readLW link1 review

On the Di­plo­macy AI

ZviNov 28, 2022, 1:20 PM
127 points
29 comments11 min readLW link
(thezvi.wordpress.com)

Geo­met­ric Ex­plo­ra­tion, Arith­metic Exploitation

Scott GarrabrantNov 24, 2022, 3:36 PM
126 points
5 comments7 min readLW link

Utili­tar­i­anism Meets Egalitarianism

Scott GarrabrantNov 21, 2022, 7:00 PM
121 points
16 comments6 min readLW link1 review

Spec­u­la­tion on Cur­rent Op­por­tu­ni­ties for Unusu­ally High Im­pact in Global Health

johnswentworthNov 11, 2022, 8:47 PM
114 points
31 comments4 min readLW link

How could we know that an AGI sys­tem will have good con­se­quences?

So8resNov 7, 2022, 10:42 PM
111 points
25 comments5 min readLW link

Ap­ply­ing su­per­in­tel­li­gence with­out col­lu­sion

Eric DrexlerNov 8, 2022, 6:08 PM
109 points
63 comments4 min readLW link

What I Learned Run­ning Refine

adamShimiNov 24, 2022, 2:49 PM
108 points
5 comments4 min readLW link

Cau­tion when in­ter­pret­ing Deep­mind’s In-con­text RL paper

Sam MarksNov 1, 2022, 2:42 AM
105 points
8 comments4 min readLW link

Here’s the exit.

ValentineNov 21, 2022, 6:07 PM
105 points
180 comments10 min readLW link5 reviews

In­stru­men­tal con­ver­gence is what makes gen­eral in­tel­li­gence possible

tailcalledNov 11, 2022, 4:38 PM
105 points
11 comments4 min readLW link

LW Beta Fea­ture: Side-Comments

jimrandomhNov 24, 2022, 1:55 AM
103 points
47 comments1 min readLW link

LessWrong read­ers are in­vited to ap­ply to the Lurkshop

Nov 22, 2022, 9:19 AM
101 points
41 comments3 min readLW link

In­stead of tech­ni­cal re­search, more peo­ple should fo­cus on buy­ing time

Nov 5, 2022, 8:43 PM
100 points
45 comments14 min readLW link

ARC pa­per: For­mal­iz­ing the pre­sump­tion of independence

Erik JennerNov 20, 2022, 1:22 AM
97 points
2 comments2 min readLW link
(arxiv.org)

Search­ing for Search

Nov 28, 2022, 3:31 PM
97 points
9 comments14 min readLW link1 review

Try­ing to Make a Treach­er­ous Mesa-Optimizer

MadHatterNov 9, 2022, 6:07 PM
95 points
14 comments4 min readLW link
(attentionspan.blog)

Meta AI an­nounces Cicero: Hu­man-Level Di­plo­macy play (with di­alogue)

Jacy Reese AnthisNov 22, 2022, 4:50 PM
93 points
64 comments1 min readLW link
(www.science.org)

Con­jec­ture Se­cond Hiring Round

Nov 23, 2022, 5:11 PM
92 points
0 comments1 min readLW link

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

Nov 16, 2022, 2:14 PM
89 points
2 comments12 min readLW link

When AI solves a game, fo­cus on the game’s me­chan­ics, not its theme.

Cleo NardoNov 23, 2022, 7:16 PM
89 points
7 comments2 min readLW link

By De­fault, GPTs Think In Plain Sight

Fabien RogerNov 19, 2022, 7:15 PM
88 points
36 comments9 min readLW link

An­nounc­ing the Progress Forum

jasoncrawfordNov 17, 2022, 7:26 PM
83 points
9 comments1 min readLW link

Always know where your ab­strac­tions break

lsusrNov 27, 2022, 6:32 AM
82 points
6 comments2 min readLW link

Re­sults from the in­ter­pretabil­ity hackathon

Nov 17, 2022, 2:51 PM
81 points
0 comments6 min readLW link
(alignmentjam.com)

Ex­ams-Only Universities

Mati_RoyNov 6, 2022, 10:05 PM
80 points
40 comments2 min readLW link

What is epi­ge­net­ics?

MetacelsusNov 6, 2022, 1:24 AM
78 points
4 comments6 min readLW link
(denovo.substack.com)

Threat Model Liter­a­ture Review

Nov 1, 2022, 11:03 AM
78 points
4 comments25 min readLW link

Elas­tic Pro­duc­tivity Tools

Simon BerensNov 19, 2022, 9:59 PM
76 points
8 comments2 min readLW link
(simonberens.me)

Fol­low up to med­i­cal miracle

ElizabethNov 4, 2022, 6:00 PM
76 points
5 comments6 min readLW link
(acesounderglass.com)

Eng­ineer­ing Monose­man­tic­ity in Toy Models

Nov 18, 2022, 1:43 AM
75 points
7 comments3 min readLW link
(arxiv.org)

Disagree­ment with bio an­chors that lead to shorter timelines

Marius HobbhahnNov 16, 2022, 2:40 PM
75 points
17 comments7 min readLW link1 review

Will we run out of ML data? Ev­i­dence from pro­ject­ing dataset size trends

Pablo VillalobosNov 14, 2022, 4:42 PM
75 points
12 comments2 min readLW link
(epochai.org)

K-types vs T-types — what pri­ors do you have?

Cleo NardoNov 3, 2022, 11:29 AM
74 points
25 comments7 min readLW link

Re­spect­ing your Lo­cal Preferences

Scott GarrabrantNov 26, 2022, 7:04 PM
73 points
1 comment4 min readLW link

Take­aways from a sur­vey on AI al­ign­ment resources

DanielFilanNov 5, 2022, 11:40 PM
73 points
10 comments6 min readLW link1 review
(danielfilan.com)

An­nounc­ing AI Align­ment Awards: $100k re­search con­tests about goal mis­gen­er­al­iza­tion & corrigibility

Nov 22, 2022, 10:19 PM
73 points
20 comments4 min readLW link