My hour of mem­o­ryless lucidity

Eric Neyman4 May 2024 1:40 UTC
358 points
35 comments5 min readLW link
(ericneyman.wordpress.com)

No­tifi­ca­tions Re­ceived in 30 Minutes of Class

tanagrabeast26 May 2024 17:02 UTC
353 points
16 comments8 min readLW link

MIRI 2024 Com­mu­ni­ca­tions Strategy

Gretta Duleba29 May 2024 19:33 UTC
319 points
202 comments7 min readLW link

Non-Dis­par­age­ment Ca­naries for OpenAI

30 May 2024 19:20 UTC
287 points
51 comments2 min readLW link

Ilya Sutskever and Jan Leike re­sign from OpenAI [up­dated]

Zach Stein-Perlman15 May 2024 0:45 UTC
246 points
95 comments2 min readLW link

Truth­seek­ing is the ground in which other prin­ci­ples grow

Elizabeth27 May 2024 1:09 UTC
242 points
16 comments16 min readLW link

AI com­pa­nies aren’t re­ally us­ing ex­ter­nal evaluators

Zach Stein-Perlman24 May 2024 16:01 UTC
240 points
15 comments4 min readLW link

OpenAI: Fallout

Zvi28 May 2024 13:20 UTC
204 points
25 comments36 min readLW link
(thezvi.wordpress.com)

Jaan Tal­linn’s 2023 Philan­thropy Overview

jaan20 May 2024 12:11 UTC
201 points
5 comments1 min readLW link
(jaan.info)

Maybe An­thropic’s Long-Term Benefit Trust is powerless

Zach Stein-Perlman27 May 2024 13:00 UTC
199 points
21 comments2 min readLW link

What’s Go­ing on With OpenAI’s Mes­sag­ing?

ozziegooen21 May 2024 2:22 UTC
191 points
13 comments1 min readLW link

Deep­Mind’s “​​Fron­tier Safety Frame­work” is weak and unambitious

Zach Stein-Perlman18 May 2024 3:00 UTC
159 points
14 comments4 min readLW link

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasper21 May 2024 20:15 UTC
157 points
16 comments3 min readLW link

Lan­guage Models Model Us

eggsyntax17 May 2024 21:00 UTC
156 points
55 comments7 min readLW link

Deep Honesty

Aletheophile7 May 2024 20:31 UTC
156 points
25 comments9 min readLW link

Dyslucksia

Shoshannah Tekofsky9 May 2024 19:21 UTC
154 points
45 comments6 min readLW link

OpenAI: Exodus

Zvi20 May 2024 13:10 UTC
153 points
26 comments44 min readLW link
(thezvi.wordpress.com)

Value Claims (In Par­tic­u­lar) Are Usu­ally Bullshit

johnswentworth30 May 2024 6:26 UTC
143 points
18 comments2 min readLW link

Do you be­lieve in hun­dred dol­lar bills ly­ing on the ground? Con­sider humming

Elizabeth16 May 2024 0:00 UTC
122 points
22 comments6 min readLW link
(acesounderglass.com)

Awakening

lsusr30 May 2024 7:03 UTC
118 points
79 comments9 min readLW link

[Question] Which skin­care prod­ucts are ev­i­dence-based?

Vanessa Kosoy2 May 2024 15:22 UTC
117 points
47 comments1 min readLW link

Ta­lent Needs of Tech­ni­cal AI Safety Teams

24 May 2024 0:36 UTC
115 points
64 comments14 min readLW link

in­tro­duc­tion to can­cer vaccines

bhauth5 May 2024 1:06 UTC
113 points
19 comments5 min readLW link
(www.bhauth.com)

The Pearly Gates

lsusr30 May 2024 4:01 UTC
111 points
6 comments3 min readLW link

Clar­ify­ing METR’s Au­dit­ing Role

Beth Barnes30 May 2024 18:41 UTC
108 points
1 comment2 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

20 May 2024 17:53 UTC
105 points
4 comments3 min readLW link

Key take­aways from our EA and al­ign­ment re­search sur­veys

3 May 2024 18:10 UTC
103 points
10 comments21 min readLW link

Re­sponse to nos­talge­braist: proudly wav­ing my moral-an­tire­al­ist bat­tle flag

Steven Byrnes29 May 2024 16:48 UTC
102 points
29 comments11 min readLW link

Ad­vice for Ac­tivists from the His­tory of Environmentalism

Jeffrey Heninger16 May 2024 18:40 UTC
100 points
8 comments6 min readLW link
(blog.aiimpacts.org)

Ex­plain­ing a Math Magic Trick

Robert_AIZI5 May 2024 19:41 UTC
97 points
10 comments5 min readLW link

[Question] How to get nerds fas­ci­nated about mys­te­ri­ous chronic ill­ness re­search?

riceissa27 May 2024 22:58 UTC
95 points
50 comments2 min readLW link

I am the Golden Gate Bridge

Zvi27 May 2024 14:40 UTC
95 points
6 comments27 min readLW link
(thezvi.wordpress.com)

Un­cov­er­ing De­cep­tive Ten­den­cies in Lan­guage Models: A Si­mu­lated Com­pany AI Assistant

6 May 2024 7:07 UTC
95 points
13 comments1 min readLW link
(arxiv.org)

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
93 points
0 comments7 min readLW link

Re­view: Conor More­ton’s “Civ­i­liza­tion & Co­op­er­a­tion”

Duncan Sabien (Deactivated)26 May 2024 19:32 UTC
89 points
8 comments38 min readLW link

Teach­ing CS Dur­ing Take-Off

andrew carle14 May 2024 22:45 UTC
88 points
13 comments2 min readLW link

We might be miss­ing some key fea­ture of AI take­off; it’ll prob­a­bly seem like “we could’ve seen this com­ing”

Lukas_Gloor9 May 2024 15:43 UTC
87 points
36 comments5 min readLW link

OpenAI: He­len Toner Speaks

Zvi30 May 2024 21:10 UTC
86 points
8 comments13 min readLW link
(thezvi.wordpress.com)

En­vi­ron­men­tal­ism in the United States Is Unusu­ally Partisan

Jeffrey Heninger13 May 2024 21:23 UTC
85 points
26 comments4 min readLW link
(blog.aiimpacts.org)

Hardshipification

Jonathan Moregård28 May 2024 20:02 UTC
84 points
17 comments2 min readLW link
(honestliving.substack.com)

“AI Safety for Fleshy Hu­mans” an AI Safety ex­plainer by Nicky Case

habryka3 May 2024 18:10 UTC
84 points
10 comments4 min readLW link
(aisafety.dance)

MATS Win­ter 2023-24 Retrospective

11 May 2024 0:09 UTC
84 points
28 comments49 min readLW link

AISafety.com – Re­sources for AI Safety

17 May 2024 15:57 UTC
81 points
3 comments1 min readLW link

New vol­un­tary com­mit­ments (AI Seoul Sum­mit)

Zach Stein-Perlman21 May 2024 11:00 UTC
81 points
17 comments7 min readLW link
(www.gov.uk)

MIRI’s May 2024 Newsletter

Harlan15 May 2024 0:13 UTC
79 points
1 comment3 min readLW link
(intelligence.org)

LessWrong Com­mu­nity Week­end 2024, open for applications

1 May 2024 10:18 UTC
79 points
2 comments7 min readLW link

My the­sis (Al­gorith­mic Bayesian Episte­mol­ogy) ex­plained in more depth

Eric Neyman9 May 2024 19:43 UTC
79 points
4 comments27 min readLW link
(ericneyman.wordpress.com)

Re­ward hack­ing be­hav­ior can gen­er­al­ize across tasks

28 May 2024 16:33 UTC
78 points
5 comments21 min readLW link

ACX Covid Ori­gins Post con­vinced readers

ErnestScribbler1 May 2024 13:06 UTC
77 points
7 comments2 min readLW link

Q&A on Pro­posed SB 1047

Zvi2 May 2024 15:10 UTC
74 points
8 comments44 min readLW link
(thezvi.wordpress.com)