All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All JanFebMar Apr May

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

How to Make Superbabies

GeneSmith and kman

Feb 19, 2025, 8:39 PM

602 points

348 comments31 min readLW link

How AI Takeover Might Happen in 2 Years

joshcFeb 7, 2025, 5:10 PM

416 points

137 comments29 min readLW link

(x.com)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley and Owain_Evans

Feb 25, 2025, 5:39 PM

328 points

90 comments4 min readLW link

Murder plots are infohazards

Chris MonteiroFeb 13, 2025, 7:15 PM

300 points

44 comments2 min readLW link

So You Want To Make Marginal Progress...

johnswentworthFeb 7, 2025, 11:22 PM

286 points

42 comments4 min readLW link

Arbital has been imported to LessWrong

RobertM, jimrandomh, Ben Pace and Ruby

Feb 20, 2025, 12:47 AM

281 points

30 comments5 min readLW link

A History of the Future, 2025-2040

L Rudolf LFeb 17, 2025, 12:03 PM

234 points

41 comments75 min readLW link

(nosetgauge.substack.com)

Power Lies Trembling: a three-book review

Richard_NgoFeb 22, 2025, 10:57 PM

213 points

27 comments15 min readLW link

(www.mindthefuture.info)

Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?

garrisonFeb 11, 2025, 12:20 AM

208 points

8 comments LW link

(garrisonlovely.substack.com)

Eliezer’s Lost Alignment Articles / The Arbital Sequence

Ruby and RobertM

Feb 20, 2025, 12:48 AM

207 points

10 comments5 min readLW link

[Question] Have LLMs Generated Novel Insights?

abramdemski and Cole Wyeth

Feb 23, 2025, 6:22 PM

156 points

36 comments2 min readLW link

It’s been ten years. I propose HPMOR Anniversary Parties.

ScrewtapeFeb 16, 2025, 1:43 AM

153 points

3 comments1 min readLW link

A computational no-coincidence principle

Eric NeymanFeb 14, 2025, 9:39 PM

148 points

38 comments6 min readLW link

(www.alignment.org)

Levels of Friction

ZviFeb 10, 2025, 1:10 PM

148 points

8 comments12 min readLW link

(thezvi.wordpress.com)

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Thane RuthenisFeb 21, 2025, 8:15 PM

148 points

51 comments6 min readLW link

The Paris AI Anti-Safety Summit

ZviFeb 12, 2025, 2:00 PM

129 points

21 comments21 min readLW link

(thezvi.wordpress.com)

Gradual Disempowerment, Shell Games and Flinches

Jan_KulveitFeb 2, 2025, 2:47 PM

129 points

36 comments6 min readLW link

Research directions Open Phil wants to fund in technical AI safety

jake_mendel, maxnadeau and Peter Favaloro

Feb 8, 2025, 1:40 AM

117 points

21 comments58 min readLW link

(www.openphilanthropy.org)

The News is Never Neglected

lsusrFeb 11, 2025, 2:59 PM

112 points

18 comments1 min readLW link

Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas

jake_mendel, maxnadeau and Peter Favaloro

Feb 6, 2025, 6:58 PM

111 points

0 comments1 min readLW link

(www.openphilanthropy.org)

Two hemispheres—I do not think it means what you think it means

ViliamFeb 9, 2025, 3:33 PM

108 points

21 comments14 min readLW link

You can just wear a suit

lsusrFeb 26, 2025, 2:57 PM

108 points

48 comments2 min readLW link

My model of what is going on with LLMs

Cole WyethFeb 13, 2025, 3:43 AM

104 points

49 comments7 min readLW link

A short course on AGI safety from the GDM Alignment team

Vika and Rohin Shah

Feb 14, 2025, 3:43 PM

103 points

2 comments1 min readLW link

(deepmindsafetyresearch.medium.com)

Judgements: Merging Prediction & Evidence

abramdemskiFeb 23, 2025, 7:35 PM

103 points

5 comments6 min readLW link

AGI Safety & Alignment @ Google DeepMind is hiring

Rohin ShahFeb 17, 2025, 9:11 PM

102 points

19 comments10 min readLW link

Detecting Strategic Deception Using Linear Probes

Nicholas Goldowsky-Dill, bilalchughtai, StefanHex and Marius Hobbhahn

Feb 6, 2025, 3:46 PM

102 points

9 comments2 min readLW link

(arxiv.org)

C’mon guys, Deliberate Practice is Real

RaemonFeb 5, 2025, 10:33 PM

99 points

25 comments9 min readLW link

Timaeus in 2024

Jesse Hoogland, Stan van Wingerden, Alexander Gietelink Oldenziel and Daniel Murfet

Feb 20, 2025, 11:54 PM

99 points

1 comment8 min readLW link

Reviewing LessWrong: Screwtape’s Basic Answer

ScrewtapeFeb 5, 2025, 4:30 AM

96 points

18 comments6 min readLW link

Anthropic releases Claude 3.7 Sonnet with extended thinking mode

LawrenceCFeb 24, 2025, 7:32 PM

88 points

8 comments4 min readLW link

(www.anthropic.com)

Wired on: “DOGE personnel with admin access to Federal Payment System”

RaemonFeb 5, 2025, 9:32 PM

88 points

45 comments2 min readLW link

(web.archive.org)

Dear AGI,

Nathan YoungFeb 18, 2025, 10:48 AM

87 points

11 comments3 min readLW link

The Risk of Gradual Disempowerment from AI

ZviFeb 5, 2025, 10:10 PM

87 points

20 comments20 min readLW link

(thezvi.wordpress.com)

Voting Results for the 2023 Review

RaemonFeb 6, 2025, 8:00 AM

86 points

3 comments69 min readLW link

How might we safely pass the buck to AI?

joshcFeb 19, 2025, 5:48 PM

83 points

58 comments31 min readLW link

Ambiguous out-of-distribution generalization on an algorithmic task

Wilson Wu and Louis Jaburi

Feb 13, 2025, 6:24 PM

83 points

6 comments11 min readLW link

Microplastics: Much Less Than You Wanted To Know

jenn, kaleb and Brent

Feb 15, 2025, 7:08 PM

82 points

8 comments13 min readLW link

The Mask Comes Off: A Trio of Tales

ZviFeb 14, 2025, 3:30 PM

81 points

1 comment13 min readLW link

(thezvi.wordpress.com)

[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

Lucy FarnikFeb 26, 2025, 12:50 PM

79 points

8 comments7 min readLW link

OpenAI releases deep research agent

Seth HerdFeb 3, 2025, 12:48 PM

78 points

21 comments3 min readLW link

(openai.com)

Pick two: concise, comprehensive, or clear rules

ScrewtapeFeb 3, 2025, 6:39 AM

78 points

27 comments8 min readLW link

Evaluating “What 2026 Looks Like” So Far

Jonny SpicerFeb 24, 2025, 6:55 PM

77 points

5 comments7 min readLW link

Anti-Slop Interventions?

abramdemskiFeb 4, 2025, 7:50 PM

76 points

33 comments6 min readLW link

Language Models Use Trigonometry to Do Addition

Subhash KantamneniFeb 5, 2025, 1:50 PM

76 points

1 comment10 min readLW link

The Simplest Good

Jesse HooglandFeb 2, 2025, 7:51 PM

75 points

6 comments5 min readLW link

MATS Applications + Research Directions I’m Currently Excited About

Neel NandaFeb 6, 2025, 11:03 AM

73 points

7 comments8 min readLW link

Osaka

lsusrFeb 26, 2025, 1:50 PM

73 points

11 comments1 min readLW link

A Problem to Solve Before Building a Deception Detector

Eleni Angelou and lewis smith

7 Feb 2025 19:35 UTC

71 points

12 comments14 min readLW link

Thermodynamic entropy = Kolmogorov complexity

Aram Ebtekar17 Feb 2025 5:56 UTC

70 points

12 comments1 min readLW link

(doi.org)