Take­aways from a sur­vey on AI al­ign­ment resources

DanielFilanNov 5, 2022, 11:40 PM
73 points
10 comments6 min readLW link1 review
(danielfilan.com)

Dist­in­guish­ing test from training

So8resNov 29, 2022, 9:41 PM
72 points
11 comments6 min readLW link

My take on Ja­cob Can­nell’s take on AGI safety

Steven ByrnesNov 28, 2022, 2:01 PM
72 points
15 comments30 min readLW link1 review

Don’t de­sign agents which ex­ploit ad­ver­sar­ial inputs

Nov 18, 2022, 1:48 AM
72 points
64 comments12 min readLW link

Up­date to Mys­ter­ies of mode col­lapse: text-davinci-002 not RLHF

janusNov 19, 2022, 11:51 PM
71 points
8 comments2 min readLW link

Ca­reer Scout­ing: Dentistry

koratkarNov 20, 2022, 3:55 PM
69 points
5 comments5 min readLW link
(careerscouting.substack.com)

Why Would AI “Aim” To Defeat Hu­man­ity?

HoldenKarnofskyNov 29, 2022, 7:30 PM
69 points
10 comments33 min readLW link
(www.cold-takes.com)

Real-Time Re­search Record­ing: Can a Trans­former Re-Derive Po­si­tional Info?

Neel NandaNov 1, 2022, 11:56 PM
69 points
16 comments1 min readLW link
(youtu.be)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

Robert MilesNov 1, 2022, 11:23 PM
68 points
105 comments2 min readLW link

Deon­tol­ogy and virtue ethics as “effec­tive the­o­ries” of con­se­quen­tial­ist ethics

Jan_KulveitNov 17, 2022, 2:11 PM
68 points
9 commentsLW link1 review

2022 LessWrong Cen­sus?

SurfingOrcaNov 7, 2022, 5:16 AM
67 points
13 comments1 min readLW link

The First Filter

Nov 26, 2022, 7:37 PM
67 points
5 comments1 min readLW link

Against “Clas­sic Style”

Cleo NardoNov 23, 2022, 10:10 PM
67 points
30 comments4 min readLW link

Clar­ify­ing wire­head­ing terminology

leogaoNov 24, 2022, 4:53 AM
66 points
6 comments1 min readLW link

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTroutNov 29, 2022, 6:23 AM
62 points
41 comments15 min readLW link

An­nounc­ing AI safety Men­tors and Mentees

Marius HobbhahnNov 23, 2022, 3:21 PM
62 points
7 comments10 min readLW link

Against a Gen­eral Fac­tor of Doom

Jeffrey HeningerNov 23, 2022, 4:50 PM
61 points
19 comments4 min readLW link1 review
(aiimpacts.org)

Could a sin­gle alien mes­sage de­stroy us?

Nov 25, 2022, 7:32 AM
61 points
23 comments6 min readLW link
(youtu.be)

FTX will prob­a­bly be sold at a steep dis­count. What we know and some fore­casts on what will hap­pen next

Nathan YoungNov 9, 2022, 2:14 AM
60 points
21 commentsLW link

The Least Con­tro­ver­sial Ap­pli­ca­tion of Geo­met­ric Rationality

Scott GarrabrantNov 25, 2022, 4:50 PM
60 points
22 comments4 min readLW link

New Fron­tiers in Mojibake

Adam ScherlisNov 26, 2022, 2:37 AM
60 points
7 comments6 min readLW link1 review
(adam.scherlis.com)

What’s the Deal with Elon Musk and Twit­ter?

ZviNov 7, 2022, 1:50 PM
60 points
13 comments31 min readLW link
(thezvi.wordpress.com)

Open tech­ni­cal prob­lem: A Quinean proof of Löb’s the­o­rem, for an eas­ier car­toon guide

Andrew_CritchNov 24, 2022, 9:16 PM
58 points
35 comments3 min readLW link1 review

Hu­mans do acausal co­or­di­na­tion all the time

Adam JermynNov 2, 2022, 2:40 PM
57 points
35 comments3 min readLW link

Some ad­vice on in­de­pen­dent research

Marius HobbhahnNov 8, 2022, 2:46 PM
56 points
5 comments10 min readLW link

A philoso­pher’s cri­tique of RLHF

TW123Nov 7, 2022, 2:42 AM
55 points
8 comments2 min readLW link

Hu­man-level Di­plo­macy was my fire alarm

Lao MeinNov 23, 2022, 10:05 AM
54 points
15 comments3 min readLW link

An­nounc­ing Non­lin­ear Emer­gency Funding

KatWoodsNov 13, 2022, 7:02 PM
54 points
0 commentsLW link

Kel­sey Piper’s re­cent in­ter­view of SBF

agucovaNov 16, 2022, 8:30 PM
51 points
29 commentsLW link

Hu­man-level Full-Press Di­plo­macy (some bare facts).

Cleo NardoNov 22, 2022, 8:59 PM
50 points
7 comments3 min readLW link

Not­ing an un­sub­stan­ti­ated com­mu­nal be­lief about the FTX disaster

YitzNov 13, 2022, 5:37 AM
50 points
52 commentsLW link

What’s the Alter­na­tive to In­de­pen­dence?

jefftkNov 13, 2022, 3:30 PM
50 points
3 comments1 min readLW link
(www.jefftk.com)

“Ru­de­ness”, a use­ful co­or­di­na­tion mechanic

RaemonNov 11, 2022, 10:27 PM
49 points
20 comments2 min readLW link

Devel­oper ex­pe­rience for the motivation

Adam ZernerNov 16, 2022, 7:12 AM
49 points
7 comments4 min readLW link

Don’t al­ign agents to eval­u­a­tions of plans

TurnTroutNov 26, 2022, 9:16 PM
48 points
49 comments18 min readLW link

In­for­ma­tion Markets

eva_Nov 2, 2022, 1:24 AM
46 points
6 comments12 min readLW link

A Mys­tery About High Di­men­sional Con­cept Encoding

Fabien RogerNov 3, 2022, 5:05 PM
46 points
13 comments7 min readLW link

A Short Dialogue on the Mean­ing of Re­ward Functions

Nov 19, 2022, 9:04 PM
45 points
0 comments3 min readLW link

For ELK truth is mostly a distraction

c.troutNov 4, 2022, 9:14 PM
44 points
0 comments21 min readLW link

The FTX Saga—Simplified

AnnapurnaNov 16, 2022, 2:42 AM
44 points
10 comments7 min readLW link
(jorgevelez.substack.com)

Spec­trum of Independence

jefftkNov 5, 2022, 2:40 AM
43 points
7 comments1 min readLW link
(www.jefftk.com)

Ra­tion­al­ist Town Hall: FTX Fal­lout Edi­tion (RSVP Re­quired)

Ben PaceNov 23, 2022, 1:38 AM
43 points
13 comments2 min readLW link

The biolog­i­cal func­tion of love for non-kin is to gain the trust of peo­ple we can­not deceive

chaosmageNov 7, 2022, 8:26 PM
43 points
3 comments8 min readLW link

The op­ti­mal an­gle for a so­lar boiler is differ­ent than for a so­lar panel

Yair HalberstadtNov 10, 2022, 10:32 AM
42 points
4 comments2 min readLW link

We must be very clear: fraud in the ser­vice of effec­tive al­tru­ism is unacceptable

evhubNov 10, 2022, 11:31 PM
42 points
56 commentsLW link

Weekly Roundup #4

ZviNov 4, 2022, 3:00 PM
42 points
1 comment6 min readLW link
(thezvi.wordpress.com)

A new­comer’s guide to the tech­ni­cal AI safety field

zeshenNov 4, 2022, 2:29 PM
42 points
3 comments10 min readLW link

Why square er­rors?

AprillionNov 26, 2022, 1:40 PM
41 points
11 comments2 min readLW link

Counterfactability

Scott GarrabrantNov 7, 2022, 5:39 AM
40 points
5 comments11 min readLW link

Scott Aaron­son on “Re­form AI Align­ment”

ShmiNov 20, 2022, 10:20 PM
39 points
17 comments1 min readLW link
(scottaaronson.blog)