How much do you be­lieve your re­sults?

Eric NeymanMay 6, 2023, 8:31 PM
507 points
18 comments15 min readLW link4 reviews
(ericneyman.wordpress.com)

Steer­ing GPT-2-XL by adding an ac­ti­va­tion vector

May 13, 2023, 6:42 PM
437 points
98 comments50 min readLW link1 review

State­ment on AI Ex­tinc­tion—Signed by AGI Labs, Top Aca­demics, and Many Other Notable Figures

Dan HMay 30, 2023, 9:05 AM
382 points
78 comments1 min readLW link1 review
(www.safe.ai)

How to have Poly­geni­cally Screened Children

GeneSmithMay 7, 2023, 4:01 PM
367 points
128 comments27 min readLW link1 review

Book Re­view: How Minds Change

bc4026bd4aaa5b7feMay 25, 2023, 5:55 PM
312 points
52 comments15 min readLW link

Pre­dictable up­dat­ing about AI risk

Joe CarlsmithMay 8, 2023, 9:53 PM
293 points
25 comments36 min readLW link1 review

My May 2023 pri­ori­ties for AI x-safety: more em­pa­thy, more unifi­ca­tion of con­cerns, and less vil­ifi­ca­tion of OpenAI

Andrew_CritchMay 24, 2023, 12:02 AM
268 points
39 comments8 min readLW link

Men­tal Health and the Align­ment Prob­lem: A Com­pila­tion of Re­sources (up­dated April 2023)

May 10, 2023, 7:04 PM
256 points
54 comments21 min readLW link

An­nounc­ing Apollo Research

May 30, 2023, 4:17 PM
217 points
11 comments8 min readLW link

Twiblings, four-par­ent ba­bies and other re­pro­duc­tive technology

GeneSmithMay 20, 2023, 5:11 PM
191 points
33 comments6 min readLW link

When is Good­hart catas­trophic?

May 9, 2023, 3:59 AM
180 points
29 comments8 min readLW link1 review

De­ci­sion The­ory with the Magic Parts Highlighted

moridinamaelMay 16, 2023, 5:39 PM
175 points
24 comments5 min readLW link

Prizes for ma­trix com­ple­tion problems

paulfchristianoMay 3, 2023, 11:30 PM
164 points
52 comments1 min readLW link
(www.alignment.org)

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris SalaMay 22, 2023, 2:31 PM
155 points
5 comments3 min readLW link
(www.conjecture.dev)

Re­quest: stop ad­vanc­ing AI capabilities

So8resMay 26, 2023, 5:42 PM
154 points
24 comments1 min readLW link

Ad­vice for newly busy people

Severin T. SeehrichMay 11, 2023, 4:46 PM
150 points
3 comments5 min readLW link

Dark For­est Theories

RaemonMay 12, 2023, 8:21 PM
145 points
53 comments2 min readLW link2 reviews

Sen­tience matters

So8resMay 29, 2023, 9:25 PM
143 points
96 comments2 min readLW link

A brief col­lec­tion of Hin­ton’s re­cent com­ments on AGI risk

Kaj_SotalaMay 4, 2023, 11:31 PM
143 points
9 comments11 min readLW link

Clar­ify­ing and pre­dict­ing AGI

Richard_NgoMay 4, 2023, 3:55 PM
142 points
45 comments4 min readLW link

LeCun’s “A Path Towards Au­tonomous Ma­chine In­tel­li­gence” has an un­solved tech­ni­cal al­ign­ment problem

Steven ByrnesMay 8, 2023, 7:35 PM
140 points
37 comments15 min readLW link

Trust de­vel­ops grad­u­ally via mak­ing bids and set­ting boundaries

Richard_NgoMay 19, 2023, 10:16 PM
134 points
12 comments4 min readLW link

AGI safety ca­reer advice

Richard_NgoMay 2, 2023, 7:36 AM
132 points
24 comments13 min readLW link

From fear to excitement

Richard_NgoMay 15, 2023, 6:23 AM
131 points
9 comments3 min readLW link

Some back­ground for rea­son­ing about dual-use al­ign­ment research

Charlie SteinerMay 18, 2023, 2:50 PM
126 points
22 comments9 min readLW link1 review

Who reg­u­lates the reg­u­la­tors? We need to go be­yond the re­view-and-ap­proval paradigm

jasoncrawfordMay 4, 2023, 10:11 PM
122 points
29 comments13 min readLW link
(rootsofprogress.org)

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

May 9, 2023, 7:41 PM
119 points
1 comment10 min readLW link

New User’s Guide to LessWrong

RubyMay 17, 2023, 12:55 AM
113 points
53 comments11 min readLW link1 review

In­ves­ti­gat­ing Fabrication

LoganStrohlMay 18, 2023, 5:46 PM
112 points
14 comments16 min readLW link

Ret­ro­spec­tive: Les­sons from the Failed Align­ment Startup AISafety.com

Søren ElverlinMay 12, 2023, 6:07 PM
105 points
9 comments3 min readLW link

AI Safety in China: Part 2

Lao MeinMay 22, 2023, 2:50 PM
103 points
28 comments2 min readLW link

Bayesian Net­works Aren’t Ne­c­es­sar­ily Causal

Zack_M_DavisMay 14, 2023, 1:42 AM
102 points
38 comments8 min readLW link1 review

Open Thread With Ex­per­i­men­tal Fea­ture: Reactions

jimrandomhMay 24, 2023, 4:46 PM
101 points
189 comments3 min readLW link

A Case for the Least For­giv­ing Take On Alignment

Thane RuthenisMay 2, 2023, 9:34 PM
100 points
85 comments22 min readLW link

Ge­off Hin­ton Quits Google

Adam ShaiMay 1, 2023, 9:03 PM
98 points
14 comments1 min readLW link

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

May 1, 2023, 4:47 PM
96 points
10 comments30 min readLW link

Judg­ments of­ten smug­gle in im­plicit standards

Richard_NgoMay 15, 2023, 6:50 PM
95 points
4 comments3 min readLW link

Most peo­ple should prob­a­bly feel safe most of the time

Kaj_SotalaMay 9, 2023, 9:35 AM
95 points
28 comments10 min readLW link

What if they gave an In­dus­trial Revolu­tion and no­body came?

jasoncrawfordMay 17, 2023, 7:41 PM
94 points
10 comments19 min readLW link
(rootsofprogress.org)

Deep­Mind: Model eval­u­a­tion for ex­treme risks

Zach Stein-PerlmanMay 25, 2023, 3:00 AM
94 points
12 comments1 min readLW link1 review
(arxiv.org)

Yoshua Ben­gio: How Rogue AIs may Arise

harfeMay 23, 2023, 6:28 PM
92 points
12 comments18 min readLW link
(yoshuabengio.org)

In­put Swap Graphs: Dis­cov­er­ing the role of neu­ral net­work com­po­nents at scale

Alexandre VariengienMay 12, 2023, 9:41 AM
92 points
0 comments33 min readLW link

An ar­tifi­cially struc­tured ar­gu­ment for ex­pect­ing AGI ruin

Rob BensingerMay 7, 2023, 9:52 PM
91 points
26 comments19 min readLW link

Co­er­cion is an adap­ta­tion to scarcity; trust is an adap­ta­tion to abundance

Richard_NgoMay 23, 2023, 6:14 PM
90 points
11 comments4 min readLW link

The bul­ls­eye frame­work: My case against AI doom

titotalMay 30, 2023, 11:52 AM
89 points
35 commentsLW link

An Anal­ogy for Un­der­stand­ing Transformers

CallumMcDougallMay 13, 2023, 12:20 PM
89 points
6 comments9 min readLW link

LessWrong Com­mu­nity Week­end 2023 [Ap­pli­ca­tions now closed]

Henry ProwbellMay 1, 2023, 9:08 AM
89 points
0 comments6 min readLW link

Con­di­tional Pre­dic­tion with Zero-Sum Train­ing Solves Self-Fulfilling Prophecies

May 26, 2023, 5:44 PM
88 points
13 comments24 min readLW link

Re­acts now en­abled on 100% of posts, though still just ex­per­i­ment­ing

RubyMay 28, 2023, 5:36 AM
88 points
73 comments2 min readLW link

We learn long-last­ing strate­gies to pro­tect our­selves from dan­ger and rejection

Richard_NgoMay 16, 2023, 4:36 PM
85 points
5 comments5 min readLW link