All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

LessWrong’s (first) album: I Have Been A Good Bing

habryka and kave

Apr 1, 2024, 7:33 AM

571 points

181 comments11 min readLW link

Transformers Represent Belief State Geometry in their Residual Stream

Adam ShaiApr 16, 2024, 9:16 PM

419 points

100 comments12 min readLW link

Thoughts on seed oil

dynomightApr 20, 2024, 12:29 PM

356 points

129 comments17 min readLW link

(dynomight.net)

[April Fools’ Day] Introducing Open Asteroid Impact

LinchApr 1, 2024, 8:14 AM

337 points

29 comments LW link

(openasteroidimpact.org)

Express interest in an “FHI of the West”

habrykaApr 18, 2024, 3:32 AM

268 points

41 comments3 min readLW link

Paul Christiano named as US AI Safety Institute Head of AI Safety

Joel BurgetApr 16, 2024, 4:22 PM

256 points

58 comments1 min readLW link

(www.commerce.gov)

Refusal in LLMs is mediated by a single direction

Andy Arditi, Oscar Obeso, Aaquib111, wesg and Neel Nanda

Apr 27, 2024, 11:13 AM

246 points

95 comments10 min readLW link

Funny Anecdote of Eliezer From His Sister

Noah BirnbaumApr 22, 2024, 10:05 PM

207 points

6 comments2 min readLW link

[Question] Examples of Highly Counterfactual Discoveries?

johnswentworthApr 23, 2024, 10:19 PM

195 points

102 comments1 min readLW link

On Not Pulling The Ladder Up Behind You

ScrewtapeApr 26, 2024, 9:58 PM

189 points

21 comments9 min readLW link

OMMC Announces RIP

Adam Scholl and aysja

Apr 1, 2024, 11:20 PM

189 points

5 comments2 min readLW link

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

johnswentworth and David Lorell

Apr 18, 2024, 12:27 AM

185 points

21 comments7 min readLW link

FHI (Future of Humanity Institute) has shut down (2005–2024)

gwernApr 17, 2024, 1:54 PM

176 points

22 comments1 min readLW link

(www.futureofhumanityinstitute.org)

Reconsider the anti-cavity bacteria if you are Asian

Lao MeinApr 15, 2024, 7:02 AM

170 points

43 comments4 min readLW link

Ironing Out the Squiggles

Zack_M_DavisApr 29, 2024, 4:13 PM

157 points

36 comments11 min readLW link

Priors and Prejudice

MathiasKBApr 22, 2024, 3:00 PM

151 points

31 comments7 min readLW link

Daniel Dennett has died (1942-2024)

kaveApr 19, 2024, 4:17 PM

150 points

5 comments1 min readLW link

(dailynous.com)

LLMs for Alignment Research: a safety priority?

abramdemskiApr 4, 2024, 8:03 PM

145 points

24 comments11 min readLW link

When is a mind me?

Rob BensingerApr 17, 2024, 5:56 AM

144 points

130 comments15 min readLW link

My experience using financial commitments to overcome akrasia

William HowardApr 15, 2024, 10:57 PM

137 points

33 comments18 min readLW link

A Dozen Ways to Get More Dakka

DavidmanheimApr 8, 2024, 4:45 AM

134 points

11 comments3 min readLW link

Simple probes can catch sleeper agents

Monte M, Carson Denison, Zac Hatfield-Dodds, David Duvenaud, Sam Bowman, Ethan Perez and evhub

Apr 23, 2024, 9:10 PM

133 points

21 comments1 min readLW link

(www.anthropic.com)

RTFB: On the New Proposed CAIP AI Bill

ZviApr 10, 2024, 6:30 PM

119 points

14 comments34 min readLW link

(thezvi.wordpress.com)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight

Sam MarksApr 18, 2024, 4:17 PM

113 points

10 comments12 min readLW link

A Selection of Randomly Selected SAE Features

CallumMcDougall and Joseph Bloom

Apr 1, 2024, 9:09 AM

109 points

2 comments4 min readLW link

[Question] What convincing warning shot could help prevent extinction from AI?

Charbel-Raphaël and cozyfractal

Apr 13, 2024, 6:09 PM

106 points

22 comments2 min readLW link

The first future and the best future

KatjaGraceApr 25, 2024, 6:40 AM

106 points

12 comments1 min readLW link

(worldspiritsockpuppet.com)

Carl Sagan, nuking the moon, and not nuking the moon

eukaryoteApr 13, 2024, 4:08 AM

104 points

8 comments6 min readLW link

(eukaryotewritesblog.com)

Sparsify: A mechanistic interpretability research agenda

Lee SharkeyApr 3, 2024, 12:34 PM

96 points

23 comments22 min readLW link

MIRI’s April 2024 Newsletter

HarlanApr 12, 2024, 11:38 PM

95 points

0 comments3 min readLW link

(intelligence.org)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers

hugofryApr 29, 2024, 8:57 PM

93 points

8 comments11 min readLW link

Partial value takeover without world takeover

KatjaGraceApr 5, 2024, 6:20 AM

89 points

23 comments3 min readLW link

(worldspiritsockpuppet.com)

Rejecting Television

Declan MolonyApr 23, 2024, 4:59 AM

89 points

10 comments6 min readLW link

Constructability: Plainly-coded AGIs may be feasible in the near future

Épiphanie Gédéon and Charbel-Raphaël

Apr 27, 2024, 4:04 PM

85 points

13 comments13 min readLW link

Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes

owencb and AI Impacts

Apr 16, 2024, 10:10 AM

82 points

12 comments8 min readLW link

(blog.aiimpacts.org)

[Full Post] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

Apr 19, 2024, 7:06 PM

79 points

10 comments8 min readLW link

A couple productivity tips for overthinkers

Steven ByrnesApr 20, 2024, 4:05 PM

78 points

13 comments4 min readLW link

Best in Class Life Improvement

sapphireApr 4, 2024, 1:51 AM

78 points

20 comments1 min readLW link

Coherence of Caches and Agents

johnswentworthApr 1, 2024, 11:04 PM

77 points

9 comments11 min readLW link

Creating unrestricted AI Agents with Command R+

Simon LermenApr 16, 2024, 2:52 PM

77 points

13 comments5 min readLW link

Mid-conditional love

KatjaGraceApr 17, 2024, 4:00 AM

76 points

21 comments2 min readLW link

(worldspiritsockpuppet.com)

AISC9 has ended and there will be an AISC10

Linda LinseforsApr 29, 2024, 10:53 AM

75 points

4 comments2 min readLW link

Announcing Suffering For Good

Garrett BakerApr 1, 2024, 5:08 PM

75 points

5 comments1 min readLW link

A gentle introduction to mechanistic anomaly detection

Erik JennerApr 3, 2024, 11:06 PM

73 points

2 comments11 min readLW link

A Gentle Introduction to Risk Frameworks Beyond Forecasting

pendingsurvival11 Apr 2024 18:03 UTC

73 points

10 comments27 min readLW link

Prompts for Big-Picture Planning

Raemon13 Apr 2024 3:04 UTC

72 points

1 comment3 min readLW link

[Summary] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

19 Apr 2024 19:06 UTC

72 points

0 comments3 min readLW link

Generalized Stat Mech: The Boltzmann Approach

David Lorell and johnswentworth

12 Apr 2024 17:47 UTC

71 points

7 comments20 min readLW link

LW Frontpage Experiments! (aka “Take the wheel, Shoggoth!”)

Ruby and RobertM

23 Apr 2024 3:58 UTC

71 points

27 comments5 min readLW link

How We Picture Bayesian Agents

johnswentworth and David Lorell

8 Apr 2024 18:12 UTC

70 points

14 comments7 min readLW link