Safetywashing

Adam SchollJul 1, 2022, 11:56 AM
260 points
20 comments1 min readLW link2 reviews

So, geez there’s a lot of AI con­tent these days

RaemonOct 6, 2022, 9:32 PM
258 points
140 comments6 min readLW link

Sex­ual Abuse at­ti­tudes might be infohazardous

Pseudonymous OtterJul 19, 2022, 6:06 PM
256 points
72 comments1 min readLW link

The shard the­ory of hu­man values

Sep 4, 2022, 4:28 AM
255 points
67 comments24 min readLW link2 reviews

AI al­ign­ment is dis­tinct from its near-term applications

paulfchristianoDec 13, 2022, 7:10 AM
255 points
21 comments2 min readLW link
(ai-alignment.com)

New Scal­ing Laws for Large Lan­guage Models

1a3ornApr 1, 2022, 8:41 PM
246 points
22 comments5 min readLW link

How “Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Su­per­vi­sion” Fits Into a Broader Align­ment Scheme

CollinDec 15, 2022, 6:22 PM
244 points
39 comments16 min readLW link1 review

A Quick Guide to Con­fronting Doom

RubyApr 13, 2022, 7:30 PM
243 points
33 comments2 min readLW link

Jailbreak­ing ChatGPT on Re­lease Day

ZviDec 2, 2022, 1:10 PM
242 points
77 comments6 min readLW link1 review
(thezvi.wordpress.com)

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_CritchJun 14, 2022, 7:31 PM
241 points
41 comments2 min readLW link1 review

The Plan − 2022 Update

johnswentworthDec 1, 2022, 8:43 PM
239 points
37 comments8 min readLW link1 review

Com­mon mis­con­cep­tions about OpenAI

Jacob_HiltonAug 25, 2022, 2:02 PM
237 points
154 comments5 min readLW link1 review

Con­tra Hofs­tadter on GPT-3 Nonsense

ricticJun 15, 2022, 9:53 PM
237 points
24 comments2 min readLW link

In­tro­duc­tion to ab­stract entropy

Alex_AltairOct 20, 2022, 9:03 PM
237 points
78 comments18 min readLW link1 review

An Ob­ser­va­tion of Vav­ilov Day

ElizabethJan 3, 2022, 9:10 PM
236 points
42 comments3 min readLW link
(acesounderglass.com)

An­nounc­ing Balsa Research

ZviSep 25, 2022, 10:50 PM
235 points
64 comments2 min readLW link1 review
(thezvi.wordpress.com)

Pro­jec­tLawful.com: Eliezer’s lat­est story, past 1M words

Eliezer YudkowskyMay 11, 2022, 6:18 AM
234 points
112 comments1 min readLW link4 reviews

Edit­ing Ad­vice for LessWrong Users

JustisMillsApr 11, 2022, 4:32 PM
233 points
14 comments6 min readLW link1 review

(briefly) RaDVaC and SMTM, two things we should be doing

Eliezer YudkowskyJan 12, 2022, 6:20 AM
230 points
79 comments3 min readLW link1 review

AGI Safety FAQ /​ all-dumb-ques­tions-al­lowed thread

Aryeh EnglanderJun 7, 2022, 5:47 AM
227 points
526 comments4 min readLW link

Moses and the Class Struggle

lsusrApr 1, 2022, 11:55 AM
225 points
26 comments5 min readLW link

Re­plac­ing Karma with Good Heart To­kens (Worth $1!)

Apr 1, 2022, 9:31 AM
225 points
173 comments4 min readLW link

How I buy things when Light­cone wants them fast

Bird ConceptSep 26, 2022, 5:02 AM
224 points
21 comments8 min readLW link

What do ML re­searchers think about AI in 2022?

KatjaGraceAug 4, 2022, 3:40 PM
221 points
33 comments3 min readLW link
(aiimpacts.org)

Les­sons learned from talk­ing to >100 aca­demics about AI safety

Marius HobbhahnOct 10, 2022, 1:16 PM
216 points
18 comments12 min readLW link1 review

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

Jul 14, 2022, 2:31 AM
212 points
94 comments9 min readLW link1 review

Unify­ing Bar­gain­ing No­tions (1/​2)

DiffractorJul 25, 2022, 12:28 AM
210 points
41 comments16 min readLW link

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworthAug 10, 2022, 4:08 PM
209 points
34 comments3 min readLW link1 review

Visi­ble Home­less­ness in SF: A Quick Break­down of Causes

alyssavanceMay 25, 2022, 1:40 AM
209 points
32 comments2 min readLW link

What does it take to defend the world against out-of-con­trol AGIs?

Steven ByrnesOct 25, 2022, 2:47 PM
208 points
49 comments30 min readLW link1 review

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworthAug 30, 2022, 8:48 PM
208 points
30 comments10 min readLW link1 review

What it’s like to dis­sect a cadaver

Alok SinghNov 10, 2022, 6:40 AM
208 points
24 comments5 min readLW link
(alok.github.io)

Benign Boundary Violations

Duncan Sabien (Deactivated)May 26, 2022, 6:48 AM
207 points
84 comments18 min readLW link1 review

Call For Distillers

johnswentworthApr 4, 2022, 6:25 PM
207 points
43 comments3 min readLW link1 review

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

Dec 3, 2022, 12:58 AM
206 points
35 comments20 min readLW link1 review

I Con­verted Book I of The Se­quences Into A Zoomer-Read­able Format

dkirmaniNov 10, 2022, 2:59 AM
200 points
32 comments2 min readLW link

Brain Effi­ciency: Much More than You Wanted to Know

jacob_cannellJan 6, 2022, 3:38 AM
200 points
103 comments29 min readLW link

But­terfly Ideas

ElizabethFeb 22, 2022, 7:40 AM
200 points
10 comments3 min readLW link2 reviews
(acesounderglass.com)

A con­crete bet offer to those with short AGI timelines

Apr 9, 2022, 9:41 PM
199 points
120 comments5 min readLW link

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

Nov 28, 2022, 12:54 PM
199 points
33 comments31 min readLW link

Do a cost-benefit anal­y­sis of your tech­nol­ogy usage

TurnTroutMar 27, 2022, 11:09 PM
198 points
53 comments13 min readLW link

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor LeahyApr 8, 2022, 11:40 AM
197 points
25 comments4 min readLW link

A note about differ­en­tial tech­nolog­i­cal development

So8resJul 15, 2022, 4:46 AM
197 points
33 comments6 min readLW link

Con­nor Leahy on Dy­ing with Dig­nity, EleutherAI and Conjecture

Michaël TrazziJul 22, 2022, 6:44 PM
195 points
29 comments14 min readLW link
(theinsideview.ai)

How my team at Light­cone some­times gets stuff done

Bird ConceptSep 19, 2022, 5:47 AM
192 points
43 comments7 min readLW link1 review

On sav­ing one’s world

Rob BensingerMay 17, 2022, 7:53 PM
192 points
4 comments1 min readLW link

Tyranny of the Epistemic Majority

Scott GarrabrantNov 22, 2022, 5:19 PM
192 points
13 comments9 min readLW link1 review

De­liber­ate Grieving

RaemonMay 30, 2022, 8:49 PM
188 points
16 comments9 min readLW link2 reviews

In­tro to Nat­u­ral­ism: Orientation

Feb 13, 2022, 7:52 AM
187 points
23 comments7 min readLW link2 reviews

Have You Tried Hiring Peo­ple?

rank-biserialMar 2, 2022, 2:06 AM
185 points
117 comments8 min readLW link1 review