A Bear Case: My Pre­dic­tions Re­gard­ing AI Progress

Thane RuthenisMar 5, 2025, 4:41 PM
354 points
155 comments9 min readLW link

Will Je­sus Christ re­turn in an elec­tion year?

Eric NeymanMar 24, 2025, 4:50 PM
330 points
46 comments4 min readLW link
(ericneyman.wordpress.com)

Policy for LLM Writ­ing on LessWrong

jimrandomhMar 24, 2025, 9:41 PM
321 points
65 comments2 min readLW link

Re­cent AI model progress feels mostly like bullshit

lcMar 24, 2025, 7:28 PM
312 points
80 comments8 min readLW link
(zeropath.com)

Trac­ing the Thoughts of a Large Lan­guage Model

Adam JermynMar 27, 2025, 5:20 PM
298 points
24 comments10 min readLW link
(www.anthropic.com)

Good Re­search Takes are Not Suffi­cient for Good Strate­gic Takes

Neel NandaMar 22, 2025, 10:13 AM
292 points
28 comments4 min readLW link
(www.neelnanda.io)

Tro­jan Sky

Richard_NgoMar 11, 2025, 3:14 AM
241 points
39 comments12 min readLW link
(www.narrativeark.xyz)

METR: Mea­sur­ing AI Abil­ity to Com­plete Long Tasks

Zach Stein-PerlmanMar 19, 2025, 4:00 PM
241 points
104 comments5 min readLW link
(metr.org)

Why White-Box Redteam­ing Makes Me Feel Weird

Zygi StraznickasMar 16, 2025, 6:54 PM
198 points
34 comments3 min readLW link

In­ten­tion to Treat

AlicornMar 20, 2025, 8:01 PM
190 points
5 comments2 min readLW link

OpenAI: De­tect­ing mis­be­hav­ior in fron­tier rea­son­ing models

Daniel KokotajloMar 11, 2025, 2:17 AM
183 points
25 comments4 min readLW link
(openai.com)

Claude Son­net 3.7 (of­ten) knows when it’s in al­ign­ment evaluations

Mar 17, 2025, 7:11 PM
179 points
7 comments6 min readLW link

So how well is Claude play­ing Poké­mon?

Julian BradshawMar 7, 2025, 5:54 AM
170 points
74 comments5 min readLW link

On the Ra­tion­al­ity of Deter­ring ASI

Dan HMar 5, 2025, 4:11 PM
166 points
34 comments4 min readLW link
(nationalsecurity.ai)

I make sev­eral mil­lion dol­lars per year and have hun­dreds of thou­sands of fol­low­ers—what is the straight­est line path to uti­liz­ing these re­sources to re­duce ex­is­ten­tial-level AI threats?

shrimpyMar 16, 2025, 4:52 PM
157 points
25 comments1 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Mar 13, 2025, 7:09 PM
155 points
40 comments6 min readLW link

Self-fulfilling mis­al­ign­ment data might be poi­son­ing our AI models

TurnTroutMar 2, 2025, 7:51 PM
154 points
27 comments1 min readLW link
(turntrout.com)

Statis­ti­cal Challenges with Mak­ing Su­per IQ babies

Jan Christian RefsgaardMar 2, 2025, 8:26 PM
153 points
26 comments9 min readLW link

Con­cep­tual Round­ing Errors

Jan_KulveitMar 26, 2025, 7:00 PM
149 points
15 comments3 min readLW link
(boundedlyrational.substack.com)

Meth­ods for strong hu­man germline en­g­ineer­ing

TsviBTMar 3, 2025, 8:13 AM
149 points
28 comments108 min readLW link

The Most For­bid­den Technique

ZviMar 12, 2025, 1:20 PM
143 points
9 comments17 min readLW link
(thezvi.wordpress.com)

The Hid­den Cost of Our Lies to AI

Nicholas AndresenMar 6, 2025, 5:03 AM
140 points
18 comments7 min readLW link
(substack.com)

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
139 points
15 comments13 min readLW link

The Mil­ton Fried­man Model of Policy Change

JohnofCharlestonMar 4, 2025, 12:38 AM
136 points
17 comments4 min readLW link

[Question] How Much Are LLMs Ac­tu­ally Boost­ing Real-World Pro­gram­mer Pro­duc­tivity?

Thane RuthenisMar 4, 2025, 4:23 PM
135 points
51 comments3 min readLW link

The Pando Prob­lem: Re­think­ing AI Individuality

Jan_KulveitMar 28, 2025, 9:03 PM
127 points
14 comments13 min readLW link

An­thropic, and tak­ing “tech­ni­cal philos­o­phy” more seriously

RaemonMar 13, 2025, 1:48 AM
125 points
29 comments11 min readLW link

[Question] when will LLMs be­come hu­man-level blog­gers?

nostalgebraistMar 9, 2025, 9:10 PM
124 points
34 comments6 min readLW link

Do rea­son­ing mod­els use their scratch­pad like we do? Ev­i­dence from dis­till­ing paraphrases

Fabien RogerMar 11, 2025, 11:52 AM
121 points
23 comments11 min readLW link
(alignment.anthropic.com)

How I’ve run ma­jor projects

benkuhnMar 16, 2025, 6:40 PM
119 points
10 comments8 min readLW link
(www.benkuhn.net)

Do mod­els say what they learn?

Mar 22, 2025, 3:19 PM
115 points
12 comments13 min readLW link

Nega­tive Re­sults for SAEs On Down­stream Tasks and Depri­ori­tis­ing SAE Re­search (GDM Mech In­terp Team Progress Up­date #2)

Mar 26, 2025, 7:07 PM
109 points
15 comments29 min readLW link
(deepmindsafetyresearch.medium.com)

2024 Unoffi­cial LessWrong Sur­vey Results

ScrewtapeMar 14, 2025, 10:29 PM
109 points
28 comments45 min readLW link

Ex­plain­ing Bri­tish Naval Dom­i­nance Dur­ing the Age of Sail

Arjun PanicksseryMar 28, 2025, 5:47 AM
109 points
5 comments4 min readLW link
(arjunpanickssery.substack.com)

Third-wave AI safety needs so­ciopoli­ti­cal thinking

Richard_NgoMar 27, 2025, 12:55 AM
99 points
23 comments26 min readLW link

AI Con­trol May In­crease Ex­is­ten­tial Risk

Jan_KulveitMar 11, 2025, 2:30 PM
98 points
13 comments1 min readLW link

What the Head­lines Miss About the Lat­est De­ci­sion in the Musk vs. OpenAI Lawsuit

garrisonMar 6, 2025, 7:49 PM
97 points
0 commentsLW link
(garrisonlovely.substack.com)

How I talk to those above me

Maxwell PetersonMar 30, 2025, 6:54 AM
97 points
14 comments8 min readLW link

Towards a scale-free the­ory of in­tel­li­gent agency

Richard_NgoMar 21, 2025, 1:39 AM
96 points
42 comments13 min readLW link
(www.mindthefuture.info)

Vacuum De­cay: Ex­pert Sur­vey Results

JessRiedelMar 13, 2025, 6:31 PM
96 points
26 commentsLW link

Elite Co­or­di­na­tion via the Con­sen­sus of Power

Richard_NgoMar 19, 2025, 6:56 AM
92 points
15 comments12 min readLW link
(www.mindthefuture.info)

How I force LLMs to gen­er­ate cor­rect code

claudioMar 21, 2025, 2:40 PM
91 points
7 comments5 min readLW link

We should start look­ing for schem­ing “in the wild”

Marius HobbhahnMar 6, 2025, 1:49 PM
89 points
4 comments5 min readLW link

What goals will AIs have? A list of hypotheses

Daniel KokotajloMar 3, 2025, 8:08 PM
87 points
19 comments18 min readLW link

OpenAI #11: Amer­ica Ac­tion Plan

ZviMar 18, 2025, 12:50 PM
83 points
3 comments6 min readLW link
(thezvi.wordpress.com)

Mis­tral Large 2 (123B) ex­hibits al­ign­ment faking

Mar 27, 2025, 3:39 PM
80 points
4 comments13 min readLW link

Open prob­lems in emer­gent misalignment

Mar 1, 2025, 9:47 AM
80 points
13 comments7 min readLW link

Elon Musk May Be Tran­si­tion­ing to Bipo­lar Type I

Cyborg25Mar 11, 2025, 5:45 PM
79 points
22 comments4 min readLW link

Go home GPT-4o, you’re drunk: emer­gent mis­al­ign­ment as low­ered inhibitions

Mar 18, 2025, 2:48 PM
79 points
12 comments5 min readLW link

AI for AI safety

Joe CarlsmithMar 14, 2025, 3:00 PM
78 points
13 comments17 min readLW link
(joecarlsmith.substack.com)