All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

AllJanFeb Mar Apr May

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

The Case Against AI Control Research

johnswentworthJan 21, 2025, 4:03 PM

353 points

80 comments6 min readLW link

What’s the short timeline plan?

Marius HobbhahnJan 2, 2025, 2:59 PM

352 points

49 comments23 min readLW link

The Gentle Romance

Richard_NgoJan 19, 2025, 6:29 PM

242 points

46 comments15 min readLW link

(www.asimov.press)

“Sharp Left Turn” discourse: An opinionated review

Steven ByrnesJan 28, 2025, 6:47 PM

208 points

26 comments31 min readLW link

Mechanisms too simple for humans to design

MalmesburyJan 22, 2025, 4:54 PM

206 points

45 comments15 min readLW link

What Is The Alignment Problem?

johnswentworthJan 16, 2025, 1:20 AM

180 points

50 comments25 min readLW link

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals

johnswentworth and David Lorell

Jan 24, 2025, 8:20 PM

180 points

61 comments5 min readLW link

How will we update about scheming?

ryan_greenblattJan 6, 2025, 8:21 PM

171 points

20 comments37 min readLW link

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet) and David Duvenaud

Jan 30, 2025, 5:03 PM

162 points

58 comments2 min readLW link

(gradual-disempowerment.ai)

Maximizing Communication, not Traffic

jefftkJan 5, 2025, 1:00 PM

161 points

10 comments1 min readLW link

(www.jefftk.com)

Don’t ignore bad vibes you get from people

Kaj_SotalaJan 18, 2025, 9:20 AM

150 points

50 comments2 min readLW link

(kajsotala.fi)

OpenAI #10: Reflections

ZviJan 7, 2025, 5:00 PM

149 points

7 comments11 min readLW link

(thezvi.wordpress.com)

Capital Ownership Will Not Prevent Human Disempowerment

berenJan 5, 2025, 6:00 AM

149 points

18 comments14 min readLW link

Quotes from the Stargate press conference

Nikola JurkovicJan 22, 2025, 12:50 AM

149 points

7 comments1 min readLW link

(www.c-span.org)

Activation space interpretability may be doomed

bilalchughtai and Lucius Bushnaq

Jan 8, 2025, 12:49 PM

148 points

33 comments8 min readLW link

AI companies are unlikely to make high-assurance safety cases if timelines are short

ryan_greenblattJan 23, 2025, 6:41 PM

145 points

5 comments13 min readLW link

Applying traditional economic thinking to AGI: a trilemma

Steven ByrnesJan 13, 2025, 1:23 AM

144 points

32 comments3 min readLW link

Human takeover might be worse than AI takeover

Tom DavidsonJan 10, 2025, 4:53 PM

141 points

55 comments8 min readLW link

Ten people on the inside

BuckJan 28, 2025, 4:41 PM

139 points

28 comments4 min readLW link

What Indicators Should We Watch to Disambiguate AGI Timelines?

snewmanJan 6, 2025, 7:57 PM

139 points

57 comments13 min readLW link

Planning for Extreme AI Risks

joshcJan 29, 2025, 6:33 PM

139 points

5 comments16 min readLW link

[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty

tandemJan 7, 2025, 7:11 PM

137 points

5 comments1 min readLW link

Anomalous Tokens in DeepSeek-V3 and r1

henryJan 25, 2025, 10:55 PM

136 points

3 comments7 min readLW link

Training on Documents About Reward Hacking Induces Reward Hacking

evhub and Nathan Hu

Jan 21, 2025, 9:32 PM

131 points

15 comments2 min readLW link

(alignment.anthropic.com)

Tell me about yourself: LLMs are aware of their learned behaviors

Martín Soto and Owain_Evans

Jan 22, 2025, 12:47 AM

130 points

5 comments6 min readLW link

Building AI Research Fleets

Ben Goldhaber and Jesse Hoogland

Jan 12, 2025, 6:23 PM

130 points

11 comments5 min readLW link

Parkinson’s Law and the Ideology of Statistics

BenquoJan 4, 2025, 3:49 PM

127 points

7 comments8 min readLW link

(benjaminrosshoffman.com)

The Intelligence Curse

lukedragoJan 3, 2025, 7:07 PM

126 points

27 comments18 min readLW link

(lukedrago.substack.com)

2024 in AI predictions

jessicataJan 1, 2025, 8:29 PM

117 points

3 comments8 min readLW link

The Game Board has been Flipped: Now is a good time to rethink what you’re doing

LintzAJan 28, 2025, 11:36 PM

112 points

30 comments13 min readLW link

Aristocracy and Hostage Capital

Arjun PanicksseryJan 8, 2025, 7:38 PM

108 points

7 comments3 min readLW link

(arjunpanickssery.substack.com)

Fake thinking and real thinking

Joe CarlsmithJan 28, 2025, 8:05 PM

108 points

13 comments38 min readLW link

Attribution-based parameter decomposition

Lucius Bushnaq, Dan Braun, StefanHex, jake_mendel and Lee Sharkey

Jan 25, 2025, 1:12 PM

107 points

21 comments4 min readLW link

(publications.apolloresearch.ai)

My supervillain origin story

Dmitry VaintrobJan 27, 2025, 12:20 PM

106 points

1 comment5 min readLW link

How do you deal w/ Super Stimuli?

Logan RiggsJan 14, 2025, 3:14 PM

106 points

25 comments3 min readLW link

Comment on “Death and the Gorgon”

Zack_M_DavisJan 1, 2025, 5:47 AM

103 points

33 comments8 min readLW link

Reasons for and against working on technical AI safety at a frontier AI lab

bilalchughtaiJan 5, 2025, 2:49 PM

100 points

12 comments12 min readLW link

The purposeful drunkard

Dmitry VaintrobJan 12, 2025, 12:27 PM

98 points

13 comments6 min readLW link

The subset parity learning problem: much more than you wanted to know

Dmitry VaintrobJan 3, 2025, 9:13 AM

94 points

18 comments11 min readLW link

Tips and Code for Empirical Research Workflows

John Hughes and Ethan Perez

Jan 20, 2025, 10:31 PM

94 points

14 comments20 min readLW link

On Eating the Sun

jessicataJan 8, 2025, 4:57 AM

94 points

96 comments3 min readLW link

(unstablerontology.substack.com)

We probably won’t just play status games with each other after AGI

Matthew BarnettJan 15, 2025, 4:56 AM

93 points

21 comments4 min readLW link

Implications of the inference scaling paradigm for AI safety

Ryan KiddJan 14, 2025, 2:14 AM

93 points

70 comments5 min readLW link

Five Recent AI Tutoring Studies

Arjun PanicksseryJan 19, 2025, 3:53 AM

93 points

0 comments2 min readLW link

(arjunpanickssery.substack.com)

The Rising Sea

Jesse HooglandJan 25, 2025, 8:48 PM

92 points

2 comments2 min readLW link

Introducing Squiggle AI

ozziegooenJan 3, 2025, 5:53 PM

92 points

15 comments LW link

Thoughts on the conservative assumptions in AI control

BuckJan 17, 2025, 7:23 PM

91 points

5 comments13 min readLW link

Six Thoughts on AI Safety

boazbarakJan 24, 2025, 10:20 PM

91 points

55 comments15 min readLW link

Tips On Empirical Research Slides

James Chua, John Hughes, Ethan Perez and Owain_Evans

Jan 8, 2025, 5:06 AM

90 points

4 comments6 min readLW link

Agent Foundations 2025 at CMU

Alexander Gietelink Oldenziel and windows

Jan 19, 2025, 11:48 PM

90 points

10 comments1 min readLW link