My hour of mem­o­ryless lucidity

Eric NeymanMay 4, 2024, 1:40 AM
369 points
35 comments5 min readLW link
(ericneyman.wordpress.com)

No­tifi­ca­tions Re­ceived in 30 Minutes of Class

tanagrabeastMay 26, 2024, 5:02 PM
356 points
16 comments8 min readLW link

MIRI 2024 Com­mu­ni­ca­tions Strategy

Gretta DulebaMay 29, 2024, 7:33 PM
325 points
216 comments7 min readLW link

Non-Dis­par­age­ment Ca­naries for OpenAI

May 30, 2024, 7:20 PM
288 points
51 comments2 min readLW link

Truth­seek­ing is the ground in which other prin­ci­ples grow

ElizabethMay 27, 2024, 1:09 AM
248 points
16 comments16 min readLW link

Ilya Sutskever and Jan Leike re­sign from OpenAI [up­dated]

Zach Stein-PerlmanMay 15, 2024, 12:45 AM
246 points
95 comments2 min readLW link

AI com­pa­nies aren’t re­ally us­ing ex­ter­nal evaluators

Zach Stein-PerlmanMay 24, 2024, 4:01 PM
242 points
15 comments4 min readLW link

OpenAI: Fallout

ZviMay 28, 2024, 1:20 PM
204 points
25 comments36 min readLW link
(thezvi.wordpress.com)

Jaan Tal­linn’s 2023 Philan­thropy Overview

jaanMay 20, 2024, 12:11 PM
203 points
5 comments1 min readLW link
(jaan.info)

Maybe An­thropic’s Long-Term Benefit Trust is powerless

Zach Stein-PerlmanMay 27, 2024, 1:00 PM
202 points
21 comments2 min readLW link

What’s Go­ing on With OpenAI’s Mes­sag­ing?

ozziegooenMay 21, 2024, 2:22 AM
191 points
13 commentsLW link

Deep­Mind’s “​​Fron­tier Safety Frame­work” is weak and unambitious

Zach Stein-PerlmanMay 18, 2024, 3:00 AM
159 points
14 comments4 min readLW link

Deep Honesty

AletheophileMay 7, 2024, 8:31 PM
159 points
25 comments9 min readLW link

Lan­guage Models Model Us

eggsyntaxMay 17, 2024, 9:00 PM
158 points
55 comments7 min readLW link

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasperMay 21, 2024, 8:15 PM
157 points
16 comments3 min readLW link

Dyslucksia

Shoshannah TekofskyMay 9, 2024, 7:21 PM
154 points
45 comments6 min readLW link

OpenAI: Exodus

ZviMay 20, 2024, 1:10 PM
153 points
26 comments44 min readLW link
(thezvi.wordpress.com)

Value Claims (In Par­tic­u­lar) Are Usu­ally Bullshit

johnswentworthMay 30, 2024, 6:26 AM
144 points
18 comments2 min readLW link

The Pearly Gates

lsusrMay 30, 2024, 4:01 AM
127 points
6 comments3 min readLW link

Awakening

lsusrMay 30, 2024, 7:03 AM
123 points
79 comments9 min readLW link

Do you be­lieve in hun­dred dol­lar bills ly­ing on the ground? Con­sider humming

ElizabethMay 16, 2024, 12:00 AM
122 points
22 comments6 min readLW link
(acesounderglass.com)

[Question] Which skin­care prod­ucts are ev­i­dence-based?

Vanessa KosoyMay 2, 2024, 3:22 PM
120 points
48 comments1 min readLW link

Ta­lent Needs of Tech­ni­cal AI Safety Teams

May 24, 2024, 12:36 AM
117 points
65 comments14 min readLW link

in­tro­duc­tion to can­cer vaccines

bhauthMay 5, 2024, 1:06 AM
113 points
19 comments5 min readLW link
(www.bhauth.com)

Key take­aways from our EA and al­ign­ment re­search sur­veys

May 3, 2024, 6:10 PM
111 points
10 comments21 min readLW link

Clar­ify­ing METR’s Au­dit­ing Role

Beth BarnesMay 30, 2024, 6:41 PM
108 points
1 comment2 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

May 20, 2024, 5:53 PM
105 points
4 comments3 min readLW link

Re­sponse to nos­talge­braist: proudly wav­ing my moral-an­tire­al­ist bat­tle flag

Steven ByrnesMay 29, 2024, 4:48 PM
103 points
29 comments11 min readLW link

Ad­vice for Ac­tivists from the His­tory of Environmentalism

Jeffrey HeningerMay 16, 2024, 6:40 PM
100 points
8 comments6 min readLW link
(blog.aiimpacts.org)

We might be miss­ing some key fea­ture of AI take­off; it’ll prob­a­bly seem like “we could’ve seen this com­ing”

Lukas_GloorMay 9, 2024, 3:43 PM
98 points
36 comments5 min readLW link

Ex­plain­ing a Math Magic Trick

Robert_AIZIMay 5, 2024, 7:41 PM
97 points
10 comments5 min readLW link

Un­cov­er­ing De­cep­tive Ten­den­cies in Lan­guage Models: A Si­mu­lated Com­pany AI Assistant

May 6, 2024, 7:07 AM
95 points
13 comments1 min readLW link
(arxiv.org)

I am the Golden Gate Bridge

ZviMay 27, 2024, 2:40 PM
95 points
6 comments27 min readLW link
(thezvi.wordpress.com)

[Question] How to get nerds fas­ci­nated about mys­te­ri­ous chronic ill­ness re­search?

riceissaMay 27, 2024, 10:58 PM
95 points
50 comments2 min readLW link

Apollo Re­search 1-year update

May 29, 2024, 5:44 PM
93 points
0 comments7 min readLW link

“AI Safety for Fleshy Hu­mans” an AI Safety ex­plainer by Nicky Case

habrykaMay 3, 2024, 6:10 PM
90 points
11 comments4 min readLW link
(aisafety.dance)

Teach­ing CS Dur­ing Take-Off

andrew carleMay 14, 2024, 10:45 PM
89 points
13 comments2 min readLW link

Hardshipification

Jonathan MoregårdMay 28, 2024, 8:02 PM
88 points
17 comments2 min readLW link
(honestliving.substack.com)

Re­view: Conor More­ton’s “Civ­i­liza­tion & Co­op­er­a­tion”

Duncan Sabien (Deactivated)May 26, 2024, 7:32 PM
88 points
8 comments38 min readLW link

MATS Win­ter 2023-24 Retrospective

May 11, 2024, 12:09 AM
86 points
28 comments49 min readLW link

OpenAI: He­len Toner Speaks

ZviMay 30, 2024, 9:10 PM
86 points
8 comments13 min readLW link
(thezvi.wordpress.com)

En­vi­ron­men­tal­ism in the United States Is Unusu­ally Partisan

Jeffrey HeningerMay 13, 2024, 9:23 PM
85 points
26 comments4 min readLW link
(blog.aiimpacts.org)

AISafety.com – Re­sources for AI Safety

May 17, 2024, 3:57 PM
82 points
3 comments1 min readLW link

My the­sis (Al­gorith­mic Bayesian Episte­mol­ogy) ex­plained in more depth

Eric NeymanMay 9, 2024, 7:43 PM
82 points
4 comments27 min readLW link
(ericneyman.wordpress.com)

New vol­un­tary com­mit­ments (AI Seoul Sum­mit)

Zach Stein-PerlmanMay 21, 2024, 11:00 AM
81 points
17 comments7 min readLW link
(www.gov.uk)

Re­ward hack­ing be­hav­ior can gen­er­al­ize across tasks

May 28, 2024, 4:33 PM
79 points
5 comments21 min readLW link

LessWrong Com­mu­nity Week­end 2024, open for applications

1 May 2024 10:18 UTC
79 points
2 comments7 min readLW link

In­struc­tion-fol­low­ing AGI is eas­ier and more likely than value al­igned AGI

Seth Herd15 May 2024 19:38 UTC
79 points
28 comments12 min readLW link

MIRI’s May 2024 Newsletter

Harlan15 May 2024 0:13 UTC
79 points
1 comment3 min readLW link
(intelligence.org)

ACX Covid Ori­gins Post con­vinced readers

ErnestScribbler1 May 2024 13:06 UTC
77 points
7 comments2 min readLW link