All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

AllJan Feb Mar Apr May

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals

johnswentworth and David Lorell

Jan 24, 2025, 8:20 PM

180 points

61 comments5 min readLW link

Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall

Vladimir_NesovMay 1, 2025, 1:54 PM

172 points

22 comments5 min readLW link

So how well is Claude playing Pokémon?

Julian BradshawMar 7, 2025, 5:54 AM

171 points

74 comments5 min readLW link

How will we update about scheming?

ryan_greenblattJan 6, 2025, 8:21 PM

171 points

20 comments37 min readLW link

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Kaj_SotalaApr 15, 2025, 3:56 PM

168 points

50 comments18 min readLW link

On the Rationality of Deterring ASI

Dan HMar 5, 2025, 4:11 PM

166 points

34 comments4 min readLW link

(nationalsecurity.ai)

Short Timelines Don’t Devalue Long Horizon Research

Vladimir_NesovApr 9, 2025, 12:42 AM

166 points

24 comments1 min readLW link

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Jan_Kulveit, Raymond Douglas, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet) and David Duvenaud

Jan 30, 2025, 5:03 PM

162 points

58 comments2 min readLW link

(gradual-disempowerment.ai)

Maximizing Communication, not Traffic

jefftkJan 5, 2025, 1:00 PM

161 points

10 comments1 min readLW link

(www.jefftk.com)

[Question] Have LLMs Generated Novel Insights?

abramdemski and Cole Wyeth

Feb 23, 2025, 6:22 PM

158 points

38 comments2 min readLW link

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?

shrimpyMar 16, 2025, 4:52 PM

157 points

25 comments1 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and AE Studio

Mar 13, 2025, 7:09 PM

155 points

40 comments6 min readLW link

Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

Adam KarvonenApr 14, 2025, 5:38 PM

154 points

42 comments7 min readLW link

(adamkarvonen.github.io)

Self-fulfilling misalignment data might be poisoning our AI models

TurnTroutMar 2, 2025, 7:51 PM

154 points

27 comments1 min readLW link

(turntrout.com)

Statistical Challenges with Making Super IQ babies

Jan Christian RefsgaardMar 2, 2025, 8:26 PM

154 points

26 comments9 min readLW link

It’s been ten years. I propose HPMOR Anniversary Parties.

ScrewtapeFeb 16, 2025, 1:43 AM

153 points

3 comments1 min readLW link

Don’t ignore bad vibes you get from people

Kaj_SotalaJan 18, 2025, 9:20 AM

152 points

50 comments2 min readLW link

(kajsotala.fi)

OpenAI #10: Reflections

ZviJan 7, 2025, 5:00 PM

149 points

7 comments11 min readLW link

(thezvi.wordpress.com)

Conceptual Rounding Errors

Jan_KulveitMar 26, 2025, 7:00 PM

149 points

15 comments3 min readLW link

(boundedlyrational.substack.com)

Capital Ownership Will Not Prevent Human Disempowerment

berenJan 5, 2025, 6:00 AM

149 points

18 comments14 min readLW link

Quotes from the Stargate press conference

Nikola JurkovicJan 22, 2025, 12:50 AM

149 points

7 comments1 min readLW link

(www.c-span.org)

Methods for strong human germline engineering

TsviBTMar 3, 2025, 8:13 AM

149 points

28 comments108 min readLW link

A computational no-coincidence principle

Eric NeymanFeb 14, 2025, 9:39 PM

148 points

38 comments6 min readLW link

(www.alignment.org)

Levels of Friction

ZviFeb 10, 2025, 1:10 PM

148 points

8 comments12 min readLW link

(thezvi.wordpress.com)

Winning the power to lose

KatjaGraceMay 20, 2025, 6:40 AM

148 points

37 comments2 min readLW link

(worldspiritsockpuppet.com)

Activation space interpretability may be doomed

bilalchughtai and Lucius Bushnaq

Jan 8, 2025, 12:49 PM

148 points

33 comments8 min readLW link

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Thane RuthenisFeb 21, 2025, 8:15 PM

148 points

51 comments6 min readLW link

Alignment Faking Revisited: Improved Classifiers and Open Source Extensions

John Hughes, abhayesian, Akbir Khan and Fabien Roger

Apr 8, 2025, 5:32 PM

146 points

20 comments12 min readLW link

AI companies are unlikely to make high-assurance safety cases if timelines are short

ryan_greenblattJan 23, 2025, 6:41 PM

145 points

5 comments13 min readLW link

Applying traditional economic thinking to AGI: a trilemma

Steven ByrnesJan 13, 2025, 1:23 AM

144 points

32 comments3 min readLW link

The Most Forbidden Technique

ZviMar 12, 2025, 1:20 PM

143 points

9 comments17 min readLW link

(thezvi.wordpress.com)

The Hidden Cost of Our Lies to AI

Nicholas AndresenMar 6, 2025, 5:03 AM

142 points

18 comments7 min readLW link

(substack.com)

Auditing language models for hidden objectives

Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M and evhub

Mar 13, 2025, 7:18 PM

141 points

15 comments13 min readLW link

Human takeover might be worse than AI takeover

Tom DavidsonJan 10, 2025, 4:53 PM

141 points

55 comments8 min readLW link

OpenAI #12: Battle of the Board Redux

ZviMar 31, 2025, 3:50 PM

141 points

1 comment9 min readLW link

(thezvi.wordpress.com)

Ten people on the inside

BuckJan 28, 2025, 4:41 PM

139 points

28 comments4 min readLW link

Training AGI in Secret would be Unsafe and Unethical

Daniel KokotajloApr 18, 2025, 12:27 PM

139 points

15 comments6 min readLW link

What Indicators Should We Watch to Disambiguate AGI Timelines?

snewmanJan 6, 2025, 7:57 PM

139 points

57 comments13 min readLW link

Planning for Extreme AI Risks

joshcJan 29, 2025, 6:33 PM

139 points

5 comments16 min readLW link

[Question] How Much Are LLMs Actually Boosting Real-World Programmer Productivity?

Thane RuthenisMar 4, 2025, 4:23 PM

137 points

51 comments3 min readLW link

[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty

tandemJan 7, 2025, 7:11 PM

137 points

5 comments1 min readLW link

The Failed Strategy of Artificial Intelligence Doomers

Ben PaceJan 31, 2025, 6:56 PM

136 points

78 comments5 min readLW link

(www.palladiummag.com)

Anomalous Tokens in DeepSeek-V3 and r1

henryJan 25, 2025, 10:55 PM

136 points

3 comments7 min readLW link

The Milton Friedman Model of Policy Change

JohnofCharlestonMar 4, 2025, 12:38 AM

136 points

17 comments4 min readLW link

Training on Documents About Reward Hacking Induces Reward Hacking

evhub and Nathan Hu

Jan 21, 2025, 9:32 PM

131 points

15 comments2 min readLW link

(alignment.anthropic.com)

AI Doomerism in 1879

David GrossMay 13, 2025, 2:48 AM

131 points

36 comments8 min readLW link

It’s Okay to Feel Bad for a Bit

moridinamaelMay 10, 2025, 11:24 PM

131 points

26 comments3 min readLW link

Tell me about yourself: LLMs are aware of their learned behaviors

Martín Soto and Owain_Evans

Jan 22, 2025, 12:47 AM

130 points

5 comments6 min readLW link

Building AI Research Fleets

Ben Goldhaber and Jesse Hoogland

Jan 12, 2025, 6:23 PM

130 points

11 comments5 min readLW link

Consider not donating under $100 to political candidates

DanielFilanMay 11, 2025, 3:20 AM

130 points

31 comments1 min readLW link

(danielfilan.com)