LessWrong’s (first) album: I Have Been A Good Bing

1 Apr 2024 7:33 UTC
531 points
159 comments11 min readLW link

Trans­form­ers Rep­re­sent Belief State Geom­e­try in their Resi­d­ual Stream

Adam Shai16 Apr 2024 21:16 UTC
368 points
83 comments12 min readLW link

There is way too much serendipity

Malmesbury19 Jan 2024 19:37 UTC
350 points
56 comments7 min readLW link

[April Fools’ Day] In­tro­duc­ing Open As­teroid Impact

Linch1 Apr 2024 8:14 UTC
324 points
29 comments1 min readLW link
(openasteroidimpact.org)

The Best Tacit Knowl­edge Videos on Every Subject

Parker Conley31 Mar 2024 17:14 UTC
315 points
129 comments16 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC
303 points
114 comments17 min readLW link
(dynomight.net)

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

12 Jan 2024 19:51 UTC
291 points
94 comments3 min readLW link
(arxiv.org)

My hour of mem­o­ryless lucidity

Eric Neyman4 May 2024 1:40 UTC
283 points
19 comments5 min readLW link
(ericneyman.wordpress.com)

Gentle­ness and the ar­tifi­cial Other

Joe Carlsmith2 Jan 2024 18:21 UTC
267 points
33 comments11 min readLW link

Scale Was All We Needed, At First

Gabe M14 Feb 2024 1:49 UTC
263 points
31 comments8 min readLW link
(aiacumen.substack.com)

Ex­press in­ter­est in an “FHI of the West”

habryka18 Apr 2024 3:32 UTC
260 points
41 comments3 min readLW link

On green

Joe Carlsmith21 Mar 2024 17:38 UTC
258 points
34 comments31 min readLW link

Paul Chris­ti­ano named as US AI Safety In­sti­tute Head of AI Safety

Joel Burget16 Apr 2024 16:22 UTC
254 points
59 comments1 min readLW link
(www.commerce.gov)

My PhD the­sis: Al­gorith­mic Bayesian Epistemology

Eric Neyman16 Mar 2024 22:56 UTC
251 points
14 comments7 min readLW link
(arxiv.org)

Failures in Kindness

silentbob26 Mar 2024 21:30 UTC
246 points
27 comments9 min readLW link

The case for en­sur­ing that pow­er­ful AIs are controlled

24 Jan 2024 16:11 UTC
245 points
66 comments28 min readLW link

“No-one in my org puts money in their pen­sion”

Tobes16 Feb 2024 18:33 UTC
243 points
7 comments9 min readLW link
(seekingtobejolly.substack.com)

My Clients, The Liars

ymeskhout5 Mar 2024 21:06 UTC
231 points
85 comments7 min readLW link

Brute Force Man­u­fac­tured Con­sen­sus is Hid­ing the Crime of the Century

Roko3 Feb 2024 20:36 UTC
220 points
156 comments9 min readLW link

MIRI 2024 Mis­sion and Strat­egy Update

Malo5 Jan 2024 0:20 UTC
216 points
44 comments8 min readLW link

CFAR Take­aways: An­drew Critch

Raemon14 Feb 2024 1:37 UTC
213 points
62 comments5 min readLW link

Believ­ing In

AnnaSalamon8 Feb 2024 7:06 UTC
212 points
49 comments13 min readLW link

ChatGPT can learn in­di­rect control

Raymond D21 Mar 2024 21:11 UTC
212 points
23 comments1 min readLW link

Modern Trans­form­ers are AGI, and Hu­man-Level

abramdemski26 Mar 2024 17:46 UTC
205 points
89 comments5 min readLW link

“How could I have thought that faster?”

mesaoptimizer11 Mar 2024 10:56 UTC
200 points
31 comments2 min readLW link
(twitter.com)

Sam Alt­man’s Chip Am­bi­tions Un­der­cut OpenAI’s Safety Strategy

garrison10 Feb 2024 19:52 UTC
198 points
52 comments1 min readLW link
(garrisonlovely.substack.com)

Funny Anec­dote of Eliezer From His Sister

Daniel Birnbaum22 Apr 2024 22:05 UTC
197 points
5 comments2 min readLW link

In­tro­duc­ing AI Lab Watch

Zach Stein-Perlman30 Apr 2024 17:00 UTC
197 points
24 comments1 min readLW link
(ailabwatch.org)

My In­ter­view With Cade Metz on His Re­port­ing About Slate Star Codex

Zack_M_Davis26 Mar 2024 17:18 UTC
188 points
186 comments6 min readLW link

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
185 points
79 comments10 min readLW link

Con­tra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”

Ricki Heicklen22 Feb 2024 23:56 UTC
184 points
5 comments4 min readLW link
(bayesshammai.substack.com)

Daniel Kah­ne­man has died

DanielFilan27 Mar 2024 15:59 UTC
183 points
11 comments1 min readLW link
(www.washingtonpost.com)

Toward A Math­e­mat­i­cal Frame­work for Com­pu­ta­tion in Superposition

18 Jan 2024 21:06 UTC
182 points
17 comments73 min readLW link

This might be the last AI Safety Camp

24 Jan 2024 9:33 UTC
181 points
33 comments1 min readLW link

The im­pos­si­ble prob­lem of due process

mingyuan16 Jan 2024 5:18 UTC
180 points
63 comments14 min readLW link

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhub12 Jan 2024 23:51 UTC
179 points
23 comments2 min readLW link

OMMC An­nounces RIP

1 Apr 2024 23:20 UTC
178 points
5 comments2 min readLW link

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

johnswentworth23 Apr 2024 22:19 UTC
178 points
94 comments1 min readLW link

Every “Every Bay Area House Party” Bay Area House Party

Richard_Ngo16 Feb 2024 18:53 UTC
174 points
6 comments4 min readLW link

FHI (Fu­ture of Hu­man­ity In­sti­tute) has shut down (2005–2024)

gwern17 Apr 2024 13:54 UTC
174 points
22 comments1 min readLW link
(www.futureofhumanityinstitute.org)

Toward a Broader Con­cep­tion of Ad­verse Selection

Ricki Heicklen14 Mar 2024 22:40 UTC
174 points
61 comments13 min readLW link
(bayesshammai.substack.com)

Why Would Belief-States Have A Frac­tal Struc­ture, And Why Would That Mat­ter For In­ter­pretabil­ity? An Explainer

18 Apr 2024 0:27 UTC
171 points
18 comments7 min readLW link

On Not Pul­ling The Lad­der Up Be­hind You

Screwtape26 Apr 2024 21:58 UTC
169 points
16 comments9 min readLW link

Re­con­sider the anti-cav­ity bac­te­ria if you are Asian

Lao Mein15 Apr 2024 7:02 UTC
168 points
41 comments4 min readLW link

Ti­maeus’s First Four Months

28 Feb 2024 17:01 UTC
167 points
6 comments6 min readLW link

‘Em­piri­cism!’ as Anti-Epistemology

Eliezer Yudkowsky14 Mar 2024 2:02 UTC
165 points
84 comments25 min readLW link

Without fun­da­men­tal ad­vances, mis­al­ign­ment and catas­tro­phe are the de­fault out­comes of train­ing pow­er­ful AI

26 Jan 2024 7:22 UTC
160 points
60 comments57 min readLW link

Mechanis­ti­cally Elic­it­ing La­tent Be­hav­iors in Lan­guage Models

30 Apr 2024 18:51 UTC
156 points
37 comments45 min readLW link

What’s up with LLMs rep­re­sent­ing XORs of ar­bi­trary fea­tures?

Sam Marks3 Jan 2024 19:44 UTC
154 points
61 comments16 min readLW link

Many ar­gu­ments for AI x-risk are wrong

TurnTrout5 Mar 2024 2:31 UTC
153 points
76 comments12 min readLW link