All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

AllJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Truthseeking is the ground in which other principles grow

ElizabethMay 27, 2024, 1:09 AM

248 points

16 comments16 min readLW link

Principles for the AGI Race

William_SAug 30, 2024, 2:29 PM

248 points

17 comments18 min readLW link

My Clients, The Liars

ymeskhoutMar 5, 2024, 9:06 PM

247 points

86 comments7 min readLW link

Ilya Sutskever and Jan Leike resign from OpenAI [updated]

Zach Stein-PerlmanMay 15, 2024, 12:45 AM

246 points

95 comments2 min readLW link

Refusal in LLMs is mediated by a single direction

Andy Arditi, Oscar Obeso, Aaquib111, wesg and Neel Nanda

Apr 27, 2024, 11:13 AM

246 points

95 comments10 min readLW link

AI companies aren’t really using external evaluators

Zach Stein-PerlmanMay 24, 2024, 4:01 PM

242 points

15 comments4 min readLW link

Believing In

AnnaSalamonFeb 8, 2024, 7:06 AM

239 points

51 comments13 min readLW link

Explore More: A Bag of Tricks to Keep Your Life on the Rails

Shoshannah TekofskySep 28, 2024, 9:38 PM

235 points

19 comments11 min readLW link

(shoshanigans.substack.com)

“How could I have thought that faster?”

mesaoptimizerMar 11, 2024, 10:56 AM

234 points

32 comments2 min readLW link

(twitter.com)

You are not too “irrational” to know your preferences.

DaystarEldNov 26, 2024, 3:01 PM

231 points

50 comments13 min readLW link

The ‘strong’ feature hypothesis could be wrong

lewis smithAug 2, 2024, 2:33 PM

231 points

19 comments17 min readLW link

SAE feature geometry is outside the superposition hypothesis

jake_mendelJun 24, 2024, 4:07 PM

228 points

17 comments11 min readLW link

Introducing AI Lab Watch

Zach Stein-PerlmanApr 30, 2024, 5:00 PM

224 points

30 comments1 min readLW link

(ailabwatch.org)

MIRI 2024 Mission and Strategy Update

MaloJan 5, 2024, 12:20 AM

223 points

44 comments8 min readLW link

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work

Rohin Shah, Seb Farquhar and Anca Dragan

Aug 20, 2024, 4:22 PM

222 points

33 comments9 min readLW link

The Hopium Wars: the AGI Entente Delusion

Max TegmarkOct 13, 2024, 5:00 PM

221 points

60 comments9 min readLW link

Modern Transformers are AGI, and Human-Level

abramdemskiMar 26, 2024, 5:46 PM

219 points

87 comments5 min readLW link

Ayn Rand’s model of “living money”; and an upside of burnout

AnnaSalamonNov 16, 2024, 2:59 AM

218 points

58 comments5 min readLW link

LLM Generality is a Timeline Crux

eggsyntaxJun 24, 2024, 12:52 PM

217 points

119 comments7 min readLW link

CFAR Takeaways: Andrew Critch

RaemonFeb 14, 2024, 1:37 AM

217 points

64 comments5 min readLW link

“Slow” takeoff is a terrible term for “maybe even faster takeoff, actually”

RaemonSep 28, 2024, 11:38 PM

217 points

69 comments1 min readLW link

A Three-Layer Model of LLM Psychology

Jan_KulveitDec 26, 2024, 4:49 PM

216 points

13 comments8 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and AE Studio

Jul 30, 2024, 4:22 PM

215 points

49 comments12 min readLW link

Superbabies: Putting The Pieces Together

sarahconstantinJul 11, 2024, 8:40 PM

215 points

37 comments10 min readLW link

(sarahconstantin.substack.com)

Understanding Shapley Values with Venn Diagrams

Carson LDec 6, 2024, 9:56 PM

214 points

34 comments LW link

(medium.com)

Optimistic Assumptions, Longterm Planning, and “Cope”

RaemonJul 17, 2024, 10:14 PM

214 points

46 comments7 min readLW link

ChatGPT can learn indirect control

Raymond DMar 21, 2024, 9:11 PM

213 points

27 comments1 min readLW link

Pay Risk Evaluators in Cash, Not Equity

Adam SchollSep 7, 2024, 2:37 AM

212 points

19 comments1 min readLW link

Towards more cooperative AI safety strategies

Richard_NgoJul 16, 2024, 4:36 AM

210 points

133 comments4 min readLW link

Making a conservative case for alignment

Cameron Berg, Judd Rosenblatt, phgubbins and AE Studio

Nov 15, 2024, 6:55 PM

208 points

67 comments7 min readLW link

Mechanistically Eliciting Latent Behaviors in Language Models

Andrew Mack and TurnTrout

Apr 30, 2024, 6:51 PM

208 points

43 comments45 min readLW link

Brute Force Manufactured Consensus is Hiding the Crime of the Century

RokoFeb 3, 2024, 8:36 PM

207 points

156 comments9 min readLW link

Why I’m not a Bayesian

Richard_NgoOct 6, 2024, 3:22 PM

207 points

101 comments10 min readLW link

(www.mindthefuture.info)

What TMS is like

SableOct 31, 2024, 12:44 AM

206 points

23 comments6 min readLW link

(affablyevil.substack.com)

Funny Anecdote of Eliezer From His Sister

Noah BirnbaumApr 22, 2024, 10:05 PM

206 points

6 comments2 min readLW link

The Sun is big, but superintelligences will not spare Earth a little sunlight

Eliezer YudkowskySep 23, 2024, 3:39 AM

205 points

142 comments13 min readLW link

Toward A Mathematical Framework for Computation in Superposition

Dmitry Vaintrob, jake_mendel and Kaarel

Jan 18, 2024, 9:06 PM

204 points

18 comments63 min readLW link

OpenAI: Fallout

ZviMay 28, 2024, 1:20 PM

204 points

25 comments36 min readLW link

(thezvi.wordpress.com)

Frontier Models are Capable of In-context Scheming

Marius Hobbhahn, AlexMeinke, Bronson Schoen, rusheb, Jérémy Scheurer and Mikita Balesni

Dec 5, 2024, 10:11 PM

203 points

24 comments7 min readLW link

Jaan Tallinn’s 2023 Philanthropy Overview

jaanMay 20, 2024, 12:11 PM

203 points

5 comments1 min readLW link

(jaan.info)

Communications in Hard Mode (My new job at MIRI)

tanagrabeastDec 13, 2024, 8:13 PM

202 points

25 comments5 min readLW link

Maybe Anthropic’s Long-Term Benefit Trust is powerless

Zach Stein-PerlmanMay 27, 2024, 1:00 PM

201 points

21 comments2 min readLW link

Cryonics is free

Mati_RoySep 29, 2024, 5:58 PM

198 points

43 comments2 min readLW link

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage

orthonormalAug 6, 2024, 2:32 AM

198 points

30 comments3 min readLW link

Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy

garrisonFeb 10, 2024, 7:52 PM

198 points

52 comments LW link

(garrisonlovely.substack.com)

The impossible problem of due process

mingyuanJan 16, 2024, 5:18 AM

197 points

64 comments14 min readLW link

This might be the last AI Safety Camp

Remmelt and Linda Linsefors

Jan 24, 2024, 9:33 AM

196 points

34 comments1 min readLW link

[Question] Examples of Highly Counterfactual Discoveries?

johnswentworthApr 23, 2024, 10:19 PM

194 points

102 comments1 min readLW link

The Compendium, A full argument about extinction risk from AGI

adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell and Andrea_Miotti

Oct 31, 2024, 12:01 PM

194 points

52 comments2 min readLW link

(www.thecompendium.ai)

Response to Aschenbrenner’s “Situational Awareness”

Rob BensingerJun 6, 2024, 10:57 PM

194 points

27 comments3 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer