All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan FebMarApr May Jun

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Will Jesus Christ return in an election year?

Eric NeymanMar 24, 2025, 4:50 PM

396 points

54 comments4 min readLW link

(ericneyman.wordpress.com)

A Bear Case: My Predictions Regarding AI Progress

Thane RuthenisMar 5, 2025, 4:41 PM

362 points

157 comments9 min readLW link

Recent AI model progress feels mostly like bullshit

lcMar 24, 2025, 7:28 PM

336 points

82 comments8 min readLW link

(zeropath.com)

Policy for LLM Writing on LessWrong

jimrandomhMar 24, 2025, 9:41 PM

322 points

70 comments2 min readLW link

Tracing the Thoughts of a Large Language Model

Adam JermynMar 27, 2025, 5:20 PM

304 points

24 comments10 min readLW link

(www.anthropic.com)

Good Research Takes are Not Sufficient for Good Strategic Takes

Neel NandaMar 22, 2025, 10:13 AM

292 points

28 comments4 min readLW link

(www.neelnanda.io)

Trojan Sky

Richard_NgoMar 11, 2025, 3:14 AM

245 points

39 comments12 min readLW link

(www.narrativeark.xyz)

METR: Measuring AI Ability to Complete Long Tasks

Zach Stein-PerlmanMar 19, 2025, 4:00 PM

241 points

104 comments5 min readLW link

(metr.org)

Why White-Box Redteaming Makes Me Feel Weird

Zygi StraznickasMar 16, 2025, 6:54 PM

201 points

36 comments3 min readLW link

Explaining British Naval Dominance During the Age of Sail

Arjun PanicksseryMar 28, 2025, 5:47 AM

199 points

17 comments4 min readLW link

(arjunpanickssery.substack.com)

Intention to Treat

AlicornMar 20, 2025, 8:01 PM

195 points

5 comments2 min readLW link

OpenAI: Detecting misbehavior in frontier reasoning models

Daniel KokotajloMar 11, 2025, 2:17 AM

183 points

26 comments4 min readLW link

(openai.com)

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations

Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer and Marius Hobbhahn

Mar 17, 2025, 7:11 PM

182 points

9 comments6 min readLW link

So how well is Claude playing Pokémon?

Julian BradshawMar 7, 2025, 5:54 AM

171 points

74 comments5 min readLW link

On the Rationality of Deterring ASI

Dan HMar 5, 2025, 4:11 PM

166 points

34 comments4 min readLW link

(nationalsecurity.ai)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?

shrimpyMar 16, 2025, 4:52 PM

161 points

26 comments1 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and AE Studio

Mar 13, 2025, 7:09 PM

155 points

41 comments6 min readLW link

Statistical Challenges with Making Super IQ babies

Jan Christian RefsgaardMar 2, 2025, 8:26 PM

154 points

26 comments9 min readLW link

Self-fulfilling misalignment data might be poisoning our AI models

TurnTroutMar 2, 2025, 7:51 PM

153 points

28 comments1 min readLW link

(turntrout.com)

Conceptual Rounding Errors

Jan_KulveitMar 26, 2025, 7:00 PM

151 points

15 comments3 min readLW link

(boundedlyrational.substack.com)

Methods for strong human germline engineering

TsviBTMar 3, 2025, 8:13 AM

149 points

28 comments108 min readLW link

The Hidden Cost of Our Lies to AI

Nicholas AndresenMar 6, 2025, 5:03 AM

144 points

18 comments7 min readLW link

(substack.com)

The Most Forbidden Technique

ZviMar 12, 2025, 1:20 PM

143 points

9 comments17 min readLW link

(thezvi.wordpress.com)

Auditing language models for hidden objectives

Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M and evhub

Mar 13, 2025, 7:18 PM

141 points

15 comments13 min readLW link

[Question] How Much Are LLMs Actually Boosting Real-World Programmer Productivity?

Thane RuthenisMar 4, 2025, 4:23 PM

137 points

52 comments3 min readLW link

The Milton Friedman Model of Policy Change

JohnofCharlestonMar 4, 2025, 12:38 AM

136 points

17 comments4 min readLW link

The Pando Problem: Rethinking AI Individuality

Jan_KulveitMar 28, 2025, 9:03 PM

128 points

14 comments13 min readLW link

Do models say what they learn?

Andy Arditi, marvinli, Joe Benton and Miles Turpin

Mar 22, 2025, 3:19 PM

126 points

12 comments13 min readLW link

Anthropic, and taking “technical philosophy” more seriously

RaemonMar 13, 2025, 1:48 AM

125 points

29 comments11 min readLW link

[Question] when will LLMs become human-level bloggers?

nostalgebraistMar 9, 2025, 9:10 PM

124 points

34 comments6 min readLW link

How I’ve run major projects

benkuhnMar 16, 2025, 6:40 PM

123 points

10 comments8 min readLW link

(www.benkuhn.net)

Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases

Fabien RogerMar 11, 2025, 11:52 AM

121 points

23 comments11 min readLW link

(alignment.anthropic.com)

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

lewis smith, Senthooran Rajamanoharan, Arthur Conmy, CallumMcDougall, Tom Lieberum, János Kramár, Rohin Shah and Neel Nanda

Mar 26, 2025, 7:07 PM

113 points

15 comments29 min readLW link

(deepmindsafetyresearch.medium.com)

2024 Unofficial LessWrong Survey Results

ScrewtapeMar 14, 2025, 10:29 PM

109 points

28 comments48 min readLW link

How I talk to those above me

Maxwell PetersonMar 30, 2025, 6:54 AM

102 points

16 comments8 min readLW link

Third-wave AI safety needs sociopolitical thinking

Richard_NgoMar 27, 2025, 12:55 AM

99 points

23 comments26 min readLW link

AI Control May Increase Existential Risk

Jan_KulveitMar 11, 2025, 2:30 PM

98 points

13 comments1 min readLW link

What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit

garrisonMar 6, 2025, 7:49 PM

98 points

0 comments LW link

(garrisonlovely.substack.com)

Vacuum Decay: Expert Survey Results

JessRiedelMar 13, 2025, 6:31 PM

96 points

26 comments LW link

Towards a scale-free theory of intelligent agency

Richard_NgoMar 21, 2025, 1:39 AM

96 points

44 comments13 min readLW link

(www.mindthefuture.info)

Elite Coordination via the Consensus of Power

Richard_NgoMar 19, 2025, 6:56 AM

92 points

15 comments12 min readLW link

(www.mindthefuture.info)

How I force LLMs to generate correct code

claudioMar 21, 2025, 2:40 PM

91 points

7 comments5 min readLW link

We should start looking for scheming “in the wild”

Marius HobbhahnMar 6, 2025, 1:49 PM

89 points

4 comments5 min readLW link

What goals will AIs have? A list of hypotheses

Daniel KokotajloMar 3, 2025, 8:08 PM

87 points

19 comments18 min readLW link

Elon Musk May Be Transitioning to Bipolar Type I

Cyborg25Mar 11, 2025, 5:45 PM

83 points

22 comments4 min readLW link

OpenAI #11: America Action Plan

ZviMar 18, 2025, 12:50 PM

83 points

3 comments6 min readLW link

(thezvi.wordpress.com)

Open problems in emergent misalignment

Jan Betley and Daniel Tan

Mar 1, 2025, 9:47 AM

82 points

13 comments7 min readLW link

Mistral Large 2 (123B) seems to exhibit alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Judd Rosenblatt, Mike Vaiana and AE Studio

Mar 27, 2025, 3:39 PM

80 points

4 comments13 min readLW link

Eukaryote Skips Town—Why I’m leaving DC

eukaryoteMar 26, 2025, 5:16 PM

80 points

1 comment6 min readLW link

(eukaryotewritesblog.com)

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

Stuart_Armstrong and rgorman

Mar 18, 2025, 2:48 PM

79 points

12 comments5 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer