Scale Was All We Needed, At First

Gabe MFeb 14, 2024, 1:49 AM
295 points
34 comments8 min readLW link
(aiacumen.substack.com)

Rais­ing chil­dren on the eve of AI

juliawiseFeb 15, 2024, 9:28 PM
275 points
47 comments5 min readLW link

“No-one in my org puts money in their pen­sion”

TobesFeb 16, 2024, 6:33 PM
270 points
16 comments9 min readLW link
(seekingtobejolly.substack.com)

Believ­ing In

AnnaSalamonFeb 8, 2024, 7:06 AM
240 points
51 comments13 min readLW link

CFAR Take­aways: An­drew Critch

RaemonFeb 14, 2024, 1:37 AM
217 points
64 comments5 min readLW link

Brute Force Man­u­fac­tured Con­sen­sus is Hid­ing the Crime of the Century

RokoFeb 3, 2024, 8:36 PM
208 points
156 comments9 min readLW link

Sam Alt­man’s Chip Am­bi­tions Un­der­cut OpenAI’s Safety Strategy

garrisonFeb 10, 2024, 7:52 PM
198 points
52 commentsLW link
(garrisonlovely.substack.com)

Con­tra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”

Ricki HeicklenFeb 22, 2024, 11:56 PM
186 points
5 comments4 min readLW link
(bayesshammai.substack.com)

Every “Every Bay Area House Party” Bay Area House Party

Richard_NgoFeb 16, 2024, 6:53 PM
181 points
6 comments4 min readLW link

Ti­maeus’s First Four Months

Feb 28, 2024, 5:01 PM
173 points
6 comments6 min readLW link

And All the Shog­goths Merely Players

Zack_M_DavisFeb 10, 2024, 7:56 PM
170 points
57 comments12 min readLW link

Masterpiece

Richard_NgoFeb 13, 2024, 11:10 PM
166 points
21 comments4 min readLW link
(www.narrativeark.xyz)

2023 Sur­vey Results

ScrewtapeFeb 16, 2024, 10:24 PM
150 points
26 comments44 min readLW link

Up­date­less­ness doesn’t solve most problems

Martín SotoFeb 8, 2024, 5:30 PM
135 points
45 comments12 min readLW link

Things I’ve Grieved

RaemonFeb 18, 2024, 7:32 PM
125 points
6 comments2 min readLW link

The Pareto Best and the Curse of Doom

ScrewtapeFeb 21, 2024, 11:10 PM
117 points
21 comments9 min readLW link

Ra­tion­al­ity Re­search Re­port: Towards 10x OODA Loop­ing?

RaemonFeb 24, 2024, 9:06 PM
117 points
25 comments15 min readLW link

At­ti­tudes about Ap­plied Rationality

Camille Berger Feb 3, 2024, 2:42 PM
108 points
18 comments4 min readLW link

Skills I’d like my col­lab­o­ra­tors to have

RaemonFeb 9, 2024, 8:20 AM
106 points
9 comments8 min readLW link

New LessWrong re­view win­ner UI (“The Least­Wrong” sec­tion and full-art post pages)

kaveFeb 28, 2024, 2:42 AM
105 points
64 comments1 min readLW link

A Chess-GPT Lin­ear Emer­gent World Representation

Adam KarvonenFeb 8, 2024, 4:25 AM
105 points
14 comments7 min readLW link
(adamkarvonen.github.io)

Lsusr’s Ra­tion­al­ity Dojo

lsusrFeb 13, 2024, 5:52 AM
103 points
17 comments2 min readLW link

Dreams of AI al­ign­ment: The dan­ger of sug­ges­tive names

TurnTroutFeb 10, 2024, 1:22 AM
103 points
59 comments4 min readLW link

Open Source Sparse Au­toen­coders for all Resi­d­ual Stream Lay­ers of GPT2-Small

Joseph BloomFeb 2, 2024, 6:54 AM
103 points
37 comments15 min readLW link

Count­ing ar­gu­ments provide no ev­i­dence for AI doom

Feb 27, 2024, 11:03 PM
101 points
188 comments14 min readLW link

My cover story in Ja­cobin on AI cap­i­tal­ism and the x-risk debates

garrisonFeb 12, 2024, 11:34 PM
98 points
5 commentsLW link
(jacobin.com)

An­nounc­ing the Lon­don Ini­ti­a­tive for Safe AI (LISA)

Feb 2, 2024, 11:17 PM
98 points
0 comments9 min readLW link

Things You’re Allowed to Do: Univer­sity Edition

Saul MunnFeb 6, 2024, 12:36 AM
97 points
13 comments5 min readLW link
(www.brasstacks.blog)

OpenAI’s Sora is an agent

Caleb BiddulphFeb 16, 2024, 7:35 AM
96 points
25 comments4 min readLW link

Ide­olog­i­cal Bayesians

Kevin DorstFeb 25, 2024, 2:17 PM
96 points
4 comments10 min readLW link
(kevindorst.substack.com)

Every­thing Wrong with Roko’s Claims about an Eng­ineered Pandemic

WitheringWeightsFeb 22, 2024, 3:59 PM
94 points
10 comments16 min readLW link

How well do truth probes gen­er­al­ise?

mishajwFeb 24, 2024, 2:12 PM
93 points
11 comments9 min readLW link

How to train your own “Sleeper Agents”

evhubFeb 7, 2024, 12:31 AM
92 points
11 comments2 min readLW link

De­bat­ing with More Per­sua­sive LLMs Leads to More Truth­ful Answers

Feb 7, 2024, 9:28 PM
89 points
14 comments9 min readLW link
(arxiv.org)

story-based de­ci­sion-making

bhauthFeb 7, 2024, 2:35 AM
89 points
11 comments4 min readLW link

More Hyphenation

Arjun PanicksseryFeb 7, 2024, 7:43 PM
88 points
19 comments1 min readLW link
(arjunpanickssery.substack.com)

Ad­dress­ing Fea­ture Sup­pres­sion in SAEs

Feb 16, 2024, 6:32 PM
86 points
4 comments10 min readLW link

Re­tire­ment Ac­counts and Short Timelines

jefftkFeb 19, 2024, 6:50 PM
83 points
35 comments2 min readLW link
(www.jefftk.com)

AI #51: Alt­man’s Ambition

ZviFeb 20, 2024, 7:50 PM
83 points
5 comments38 min readLW link
(thezvi.wordpress.com)

The Gem­ini Incident

ZviFeb 22, 2024, 9:00 PM
80 points
19 comments18 min readLW link
(thezvi.wordpress.com)

At­ten­tion SAEs Scale to GPT-2 Small

Feb 3, 2024, 6:50 AM
78 points
4 comments8 min readLW link

Analo­gies be­tween scal­ing labs and mis­al­igned su­per­in­tel­li­gent AI

scasperFeb 21, 2024, 7:29 PM
77 points
5 comments4 min readLW link

My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre VariengienFeb 6, 2024, 7:10 PM
75 points
12 comments16 min readLW link

Im­ple­ment­ing ac­ti­va­tion steering

AnnahFeb 5, 2024, 5:51 PM
75 points
8 comments7 min readLW link

Do sparse au­toen­coders find “true fea­tures”?

Demian TillFeb 22, 2024, 6:06 PM
74 points
33 comments11 min readLW link

The One and a Half Gemini

ZviFeb 22, 2024, 1:10 PM
73 points
4 comments8 min readLW link
(thezvi.wordpress.com)

Sur­vey for al­ign­ment re­searchers!

Feb 2, 2024, 8:41 PM
71 points
11 comments1 min readLW link

Wrong an­swer bias

lemonhopeFeb 1, 2024, 8:05 PM
71 points
23 comments1 min readLW link

Prevent­ing model exfil­tra­tion with up­load limits

ryan_greenblattFeb 6, 2024, 4:29 PM
69 points
22 comments14 min readLW link

Davi­dad’s Prov­ably Safe AI Ar­chi­tec­ture—ARIA’s Pro­gramme Thesis

simeon_cFeb 1, 2024, 9:30 PM
69 points
17 comments1 min readLW link
(www.aria.org.uk)