All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

AllJan Feb Mar Apr May

LessWrong’s (first) album: I Have Been A Good Bing

habryka and kave

1 Apr 2024 7:33 UTC

531 points

159 comments11 min readLW link

Transformers Represent Belief State Geometry in their Residual Stream

Adam Shai16 Apr 2024 21:16 UTC

368 points

83 comments12 min readLW link

There is way too much serendipity

Malmesbury19 Jan 2024 19:37 UTC

350 points

56 comments7 min readLW link

[April Fools’ Day] Introducing Open Asteroid Impact

Linch1 Apr 2024 8:14 UTC

324 points

29 comments1 min readLW link

(openasteroidimpact.org)

The Best Tacit Knowledge Videos on Every Subject

Parker Conley31 Mar 2024 17:14 UTC

315 points

129 comments16 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC

303 points

114 comments17 min readLW link

(dynomight.net)

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

evhub, Carson Denison, Meg, Monte M, David Duvenaud, Nicholas Schiefer and Ethan Perez

12 Jan 2024 19:51 UTC

291 points

94 comments3 min readLW link

(arxiv.org)

My hour of memoryless lucidity

Eric Neyman4 May 2024 1:40 UTC

283 points

19 comments5 min readLW link

(ericneyman.wordpress.com)

Gentleness and the artificial Other

Joe Carlsmith2 Jan 2024 18:21 UTC

267 points

33 comments11 min readLW link

Scale Was All We Needed, At First

Gabe M14 Feb 2024 1:49 UTC

263 points

31 comments8 min readLW link

(aiacumen.substack.com)

Express interest in an “FHI of the West”

habryka18 Apr 2024 3:32 UTC

260 points

41 comments3 min readLW link

On green

Joe Carlsmith21 Mar 2024 17:38 UTC

258 points

34 comments31 min readLW link

Paul Christiano named as US AI Safety Institute Head of AI Safety

Joel Burget16 Apr 2024 16:22 UTC

254 points

59 comments1 min readLW link

(www.commerce.gov)

My PhD thesis: Algorithmic Bayesian Epistemology

Eric Neyman16 Mar 2024 22:56 UTC

251 points

14 comments7 min readLW link

(arxiv.org)

Failures in Kindness

silentbob26 Mar 2024 21:30 UTC

246 points

27 comments9 min readLW link

The case for ensuring that powerful AIs are controlled

ryan_greenblatt and Buck

24 Jan 2024 16:11 UTC

245 points

66 comments28 min readLW link

“No-one in my org puts money in their pension”

Tobes16 Feb 2024 18:33 UTC

243 points

7 comments9 min readLW link

(seekingtobejolly.substack.com)

My Clients, The Liars

ymeskhout5 Mar 2024 21:06 UTC

231 points

85 comments7 min readLW link

Brute Force Manufactured Consensus is Hiding the Crime of the Century

Roko3 Feb 2024 20:36 UTC

220 points

156 comments9 min readLW link

MIRI 2024 Mission and Strategy Update

Malo5 Jan 2024 0:20 UTC

216 points

44 comments8 min readLW link

CFAR Takeaways: Andrew Critch

Raemon14 Feb 2024 1:37 UTC

213 points

62 comments5 min readLW link

Believing In

AnnaSalamon8 Feb 2024 7:06 UTC

212 points

49 comments13 min readLW link

ChatGPT can learn indirect control

Raymond D21 Mar 2024 21:11 UTC

212 points

23 comments1 min readLW link

Modern Transformers are AGI, and Human-Level

abramdemski26 Mar 2024 17:46 UTC

205 points

89 comments5 min readLW link

“How could I have thought that faster?”

mesaoptimizer11 Mar 2024 10:56 UTC

200 points

31 comments2 min readLW link

(twitter.com)

Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy

garrison10 Feb 2024 19:52 UTC

198 points

52 comments1 min readLW link

(garrisonlovely.substack.com)

Funny Anecdote of Eliezer From His Sister

Daniel Birnbaum22 Apr 2024 22:05 UTC

197 points

5 comments2 min readLW link

Introducing AI Lab Watch

Zach Stein-Perlman30 Apr 2024 17:00 UTC

197 points

24 comments1 min readLW link

(ailabwatch.org)

My Interview With Cade Metz on His Reporting About Slate Star Codex

Zack_M_Davis26 Mar 2024 17:18 UTC

188 points

186 comments6 min readLW link

Refusal in LLMs is mediated by a single direction

Andy Arditi, Oscar Obeso, Aaquib111, wesg and Neel Nanda

27 Apr 2024 11:13 UTC

185 points

79 comments10 min readLW link

Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”

Ricki Heicklen22 Feb 2024 23:56 UTC

184 points

5 comments4 min readLW link

(bayesshammai.substack.com)

Daniel Kahneman has died

DanielFilan27 Mar 2024 15:59 UTC

183 points

11 comments1 min readLW link

(www.washingtonpost.com)

Toward A Mathematical Framework for Computation in Superposition

Dmitry Vaintrob, jake_mendel and Kaarel

18 Jan 2024 21:06 UTC

182 points

17 comments73 min readLW link

This might be the last AI Safety Camp

Remmelt and Linda Linsefors

24 Jan 2024 9:33 UTC

181 points

33 comments1 min readLW link

The impossible problem of due process

mingyuan16 Jan 2024 5:18 UTC

180 points

63 comments14 min readLW link

Introducing Alignment Stress-Testing at Anthropic

evhub12 Jan 2024 23:51 UTC

179 points

23 comments2 min readLW link

OMMC Announces RIP

Adam Scholl and aysja

1 Apr 2024 23:20 UTC

178 points

5 comments2 min readLW link

[Question] Examples of Highly Counterfactual Discoveries?

johnswentworth23 Apr 2024 22:19 UTC

178 points

94 comments1 min readLW link

Every “Every Bay Area House Party” Bay Area House Party

Richard_Ngo16 Feb 2024 18:53 UTC

174 points

6 comments4 min readLW link

FHI (Future of Humanity Institute) has shut down (2005–2024)

gwern17 Apr 2024 13:54 UTC

174 points

22 comments1 min readLW link

(www.futureofhumanityinstitute.org)

Toward a Broader Conception of Adverse Selection

Ricki Heicklen14 Mar 2024 22:40 UTC

174 points

61 comments13 min readLW link

(bayesshammai.substack.com)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

johnswentworth and David Lorell

18 Apr 2024 0:27 UTC

171 points

18 comments7 min readLW link

On Not Pulling The Ladder Up Behind You

Screwtape26 Apr 2024 21:58 UTC

169 points

16 comments9 min readLW link

Reconsider the anti-cavity bacteria if you are Asian

Lao Mein15 Apr 2024 7:02 UTC

168 points

41 comments4 min readLW link

Timaeus’s First Four Months

Jesse Hoogland, Daniel Murfet, Stan van Wingerden and Alexander Gietelink Oldenziel

28 Feb 2024 17:01 UTC

167 points

6 comments6 min readLW link

‘Empiricism!’ as Anti-Epistemology

Eliezer Yudkowsky14 Mar 2024 2:02 UTC

165 points

84 comments25 min readLW link

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI

Jeremy Gillen and peterbarnett

26 Jan 2024 7:22 UTC

160 points

60 comments57 min readLW link

Mechanistically Eliciting Latent Behaviors in Language Models

Andrew Mack and TurnTrout

30 Apr 2024 18:51 UTC

156 points

37 comments45 min readLW link

What’s up with LLMs representing XORs of arbitrary features?

Sam Marks3 Jan 2024 19:44 UTC

154 points

61 comments16 min readLW link

Many arguments for AI x-risk are wrong

TurnTrout5 Mar 2024 2:31 UTC

153 points

76 comments12 min readLW link