Scale Was All We Needed, At First

Gabe M14 Feb 2024 1:49 UTC
286 points
33 comments8 min readLW link
(aiacumen.substack.com)

“No-one in my org puts money in their pen­sion”

Tobes16 Feb 2024 18:33 UTC
267 points
16 comments9 min readLW link
(seekingtobejolly.substack.com)

Rais­ing chil­dren on the eve of AI

juliawise15 Feb 2024 21:28 UTC
266 points
47 comments5 min readLW link

Believ­ing In

AnnaSalamon8 Feb 2024 7:06 UTC
230 points
51 comments13 min readLW link

CFAR Take­aways: An­drew Critch

Raemon14 Feb 2024 1:37 UTC
217 points
62 comments5 min readLW link

Brute Force Man­u­fac­tured Con­sen­sus is Hid­ing the Crime of the Century

Roko3 Feb 2024 20:36 UTC
216 points
156 comments9 min readLW link

Sam Alt­man’s Chip Am­bi­tions Un­der­cut OpenAI’s Safety Strategy

garrison10 Feb 2024 19:52 UTC
198 points
52 comments1 min readLW link
(garrisonlovely.substack.com)

Con­tra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”

Ricki Heicklen22 Feb 2024 23:56 UTC
186 points
5 comments4 min readLW link
(bayesshammai.substack.com)

Every “Every Bay Area House Party” Bay Area House Party

Richard_Ngo16 Feb 2024 18:53 UTC
177 points
6 comments4 min readLW link

Ti­maeus’s First Four Months

28 Feb 2024 17:01 UTC
172 points
6 comments6 min readLW link

And All the Shog­goths Merely Players

Zack_M_Davis10 Feb 2024 19:56 UTC
160 points
57 comments12 min readLW link

Masterpiece

Richard_Ngo13 Feb 2024 23:10 UTC
160 points
21 comments4 min readLW link
(www.narrativeark.xyz)

2023 Sur­vey Results

Screwtape16 Feb 2024 22:24 UTC
150 points
26 comments44 min readLW link

Up­date­less­ness doesn’t solve most problems

Martín Soto8 Feb 2024 17:30 UTC
130 points
44 comments12 min readLW link

Things I’ve Grieved

Raemon18 Feb 2024 19:32 UTC
124 points
6 comments2 min readLW link

The Pareto Best and the Curse of Doom

Screwtape21 Feb 2024 23:10 UTC
113 points
21 comments9 min readLW link

Ra­tion­al­ity Re­search Re­port: Towards 10x OODA Loop­ing?

Raemon24 Feb 2024 21:06 UTC
113 points
21 comments15 min readLW link

At­ti­tudes about Ap­plied Rationality

Camille Berger 3 Feb 2024 14:42 UTC
108 points
18 comments4 min readLW link

Skills I’d like my col­lab­o­ra­tors to have

Raemon9 Feb 2024 8:20 UTC
106 points
9 comments8 min readLW link

New LessWrong re­view win­ner UI (“The Least­Wrong” sec­tion and full-art post pages)

kave28 Feb 2024 2:42 UTC
105 points
64 comments1 min readLW link

A Chess-GPT Lin­ear Emer­gent World Representation

Adam Karvonen8 Feb 2024 4:25 UTC
105 points
14 comments7 min readLW link
(adamkarvonen.github.io)

Dreams of AI al­ign­ment: The dan­ger of sug­ges­tive names

TurnTrout10 Feb 2024 1:22 UTC
103 points
59 comments4 min readLW link

Lsusr’s Ra­tion­al­ity Dojo

lsusr13 Feb 2024 5:52 UTC
102 points
17 comments2 min readLW link

Open Source Sparse Au­toen­coders for all Resi­d­ual Stream Lay­ers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC
102 points
37 comments15 min readLW link

An­nounc­ing the Lon­don Ini­ti­a­tive for Safe AI (LISA)

2 Feb 2024 23:17 UTC
98 points
0 comments9 min readLW link

My cover story in Ja­cobin on AI cap­i­tal­ism and the x-risk debates

garrison12 Feb 2024 23:34 UTC
98 points
5 comments1 min readLW link
(jacobin.com)

OpenAI’s Sora is an agent

CBiddulph16 Feb 2024 7:35 UTC
96 points
25 comments4 min readLW link

Count­ing ar­gu­ments provide no ev­i­dence for AI doom

27 Feb 2024 23:03 UTC
95 points
188 comments14 min readLW link

Ide­olog­i­cal Bayesians

Kevin Dorst25 Feb 2024 14:17 UTC
95 points
4 comments10 min readLW link
(kevindorst.substack.com)

Things You’re Allowed to Do: Univer­sity Edition

Saul Munn6 Feb 2024 0:36 UTC
94 points
13 comments5 min readLW link
(www.brasstacks.blog)

Every­thing Wrong with Roko’s Claims about an Eng­ineered Pandemic

WitheringWeights22 Feb 2024 15:59 UTC
92 points
10 comments16 min readLW link

How to train your own “Sleeper Agents”

evhub7 Feb 2024 0:31 UTC
91 points
11 comments2 min readLW link

story-based de­ci­sion-making

bhauth7 Feb 2024 2:35 UTC
89 points
11 comments4 min readLW link

De­bat­ing with More Per­sua­sive LLMs Leads to More Truth­ful Answers

7 Feb 2024 21:28 UTC
88 points
14 comments9 min readLW link
(arxiv.org)

More Hyphenation

Arjun Panickssery7 Feb 2024 19:43 UTC
87 points
19 comments1 min readLW link
(arjunpanickssery.substack.com)

How well do truth probes gen­er­al­ise?

mishajw24 Feb 2024 14:12 UTC
87 points
11 comments9 min readLW link

Ad­dress­ing Fea­ture Sup­pres­sion in SAEs

16 Feb 2024 18:32 UTC
86 points
4 comments10 min readLW link

Re­tire­ment Ac­counts and Short Timelines

jefftk19 Feb 2024 18:50 UTC
83 points
35 comments2 min readLW link
(www.jefftk.com)

AI #51: Alt­man’s Ambition

Zvi20 Feb 2024 19:50 UTC
83 points
5 comments38 min readLW link
(thezvi.wordpress.com)

The Gem­ini Incident

Zvi22 Feb 2024 21:00 UTC
80 points
19 comments18 min readLW link
(thezvi.wordpress.com)

At­ten­tion SAEs Scale to GPT-2 Small

3 Feb 2024 6:50 UTC
77 points
4 comments8 min readLW link

My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre Variengien6 Feb 2024 19:10 UTC
75 points
12 comments16 min readLW link

Analo­gies be­tween scal­ing labs and mis­al­igned su­per­in­tel­li­gent AI

scasper21 Feb 2024 19:29 UTC
75 points
5 comments4 min readLW link

The One and a Half Gemini

Zvi22 Feb 2024 13:10 UTC
73 points
4 comments8 min readLW link
(thezvi.wordpress.com)

Do sparse au­toen­coders find “true fea­tures”?

Demian Till22 Feb 2024 18:06 UTC
73 points
33 comments11 min readLW link

Sur­vey for al­ign­ment re­searchers!

2 Feb 2024 20:41 UTC
71 points
11 comments1 min readLW link

Davi­dad’s Prov­ably Safe AI Ar­chi­tec­ture—ARIA’s Pro­gramme Thesis

simeon_c1 Feb 2024 21:30 UTC
69 points
17 comments1 min readLW link
(www.aria.org.uk)

Im­ple­ment­ing ac­ti­va­tion steering

Annah5 Feb 2024 17:51 UTC
68 points
7 comments7 min readLW link

Prevent­ing model exfil­tra­tion with up­load limits

ryan_greenblatt6 Feb 2024 16:29 UTC
66 points
21 comments14 min readLW link

Most ex­perts be­lieve COVID-19 was prob­a­bly not a lab leak

DanielFilan2 Feb 2024 19:28 UTC
66 points
89 comments2 min readLW link
(gcrinstitute.org)