All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

I would have shit in that alley, too

Declan MolonyJun 18, 2024, 4:41 AM

462 points

134 comments4 min readLW link

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Andrew_CritchJun 14, 2024, 12:16 AM

357 points

38 comments4 min readLW link

My AI Model Delta Compared To Yudkowsky

johnswentworthJun 10, 2024, 4:12 PM

280 points

103 comments4 min readLW link

Getting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblattJun 17, 2024, 6:44 PM

263 points

50 comments13 min readLW link

SAE feature geometry is outside the superposition hypothesis

jake_mendelJun 24, 2024, 4:07 PM

228 points

17 comments11 min readLW link

LLM Generality is a Timeline Crux

eggsyntaxJun 24, 2024, 12:52 PM

218 points

119 comments7 min readLW link

Response to Aschenbrenner’s “Situational Awareness”

Rob BensingerJun 6, 2024, 10:57 PM

194 points

27 comments3 min readLW link

Two easy things that maybe Just Work to improve AI discourse

Bird ConceptJun 8, 2024, 3:51 PM

191 points

35 comments2 min readLW link

My AI Model Delta Compared To Christiano

johnswentworthJun 12, 2024, 6:19 PM

191 points

73 comments4 min readLW link

Humming is not a free $100 bill

ElizabethJun 6, 2024, 8:10 PM

185 points

6 comments3 min readLW link

(acesounderglass.com)

Boycott OpenAI

PeterMcCluskeyJun 18, 2024, 7:52 PM

164 points

26 comments1 min readLW link

(bayesianinvestor.com)

Announcing ILIAD — Theoretical AI Alignment Conference

Nora_Ammann and Alexander Gietelink Oldenziel

Jun 5, 2024, 9:37 AM

163 points

18 comments2 min readLW link

Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data

Johannes Treutlein and Owain_Evans

Jun 21, 2024, 3:54 PM

163 points

13 comments8 min readLW link

(arxiv.org)

Sycophancy to subterfuge: Investigating reward tampering in large language models

Carson Denison and evhub

Jun 17, 2024, 6:41 PM

161 points

22 comments8 min readLW link

(arxiv.org)

The Incredible Fentanyl-Detecting Machine

sarahconstantinJun 28, 2024, 10:10 PM

156 points

26 comments7 min readLW link

(sarahconstantin.substack.com)

Formal verification, heuristic explanations and surprise accounting

Jacob_HiltonJun 25, 2024, 3:40 PM

156 points

11 comments9 min readLW link

(www.alignment.org)

0. CAST: Corrigibility as Singular Target

Max HarmsJun 7, 2024, 10:29 PM

147 points

17 comments8 min readLW link

Loving a world you don’t trust

Joe CarlsmithJun 18, 2024, 7:31 PM

135 points

13 comments33 min readLW link

How it All Went Down: The Puzzle Hunt that took us way, way Less Online

A*Jun 2, 2024, 8:01 AM

135 points

5 comments5 min readLW link

Why I don’t believe in the placebo effect

transhumanist_atom_understanderJun 10, 2024, 2:37 AM

134 points

22 comments9 min readLW link

The Standard Analogy

Zack_M_DavisJun 3, 2024, 5:15 PM

125 points

28 comments12 min readLW link

[Question] What do coherence arguments actually prove about agentic behavior?

sunwillriseJun 1, 2024, 9:37 AM

123 points

39 comments6 min readLW link

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Erik JennerJun 4, 2024, 3:50 PM

121 points

14 comments13 min readLW link

AI catastrophes and rogue deployments

BuckJun 3, 2024, 5:04 PM

120 points

16 comments8 min readLW link

Anthropic’s Certificate of Incorporation

Zach Stein-PerlmanJun 12, 2024, 1:00 PM

115 points

7 comments4 min readLW link

The Leopold Model: Analysis and Reactions

ZviJun 14, 2024, 3:10 PM

109 points

19 comments57 min readLW link

(thezvi.wordpress.com)

Scaling and evaluating sparse autoencoders

leogaoJun 6, 2024, 10:50 PM

106 points

6 comments1 min readLW link

Demystifying “Alignment” through a Comic

milanroskoJun 9, 2024, 8:24 AM

106 points

19 comments1 min readLW link

In favour of exploring nagging doubts about x-risk

owencbJun 25, 2024, 11:52 PM

105 points

2 comments LW link

Access to powerful AI might make computer security radically easier

BuckJun 8, 2024, 6:00 AM

105 points

14 comments6 min readLW link

The Minority Coalition

Richard_NgoJun 24, 2024, 8:01 PM

103 points

9 comments5 min readLW link

(www.narrativeark.xyz)

On Dwarksh’s Podcast with Leopold Aschenbrenner

ZviJun 10, 2024, 12:40 PM

102 points

7 comments59 min readLW link

(thezvi.wordpress.com)

Live Theory Part 0: Taking Intelligence Seriously

SahilJun 26, 2024, 9:37 PM

101 points

3 comments8 min readLW link

Comments on Anthropic’s Scaling Monosemanticity

Robert_AIZIJun 3, 2024, 12:15 PM

98 points

8 comments7 min readLW link

CIV: a story

Richard_NgoJun 15, 2024, 10:36 PM

98 points

6 comments9 min readLW link

(www.narrativeark.xyz)

OpenAI #8: The Right to Warn

ZviJun 17, 2024, 12:00 PM

97 points

8 comments34 min readLW link

(thezvi.wordpress.com)

Quotes from Leopold Aschenbrenner’s Situational Awareness Paper

ZviJun 7, 2024, 11:40 AM

97 points

10 comments37 min readLW link

(thezvi.wordpress.com)

Compact Proofs of Model Performance via Mechanistic Interpretability

LawrenceC, rajashree, Adrià Garriga-alonso and Jason Gross

Jun 24, 2024, 7:27 PM

96 points

4 comments8 min readLW link

(arxiv.org)

On Claude 3.5 Sonnet

ZviJun 24, 2024, 12:00 PM

95 points

14 comments13 min readLW link

(thezvi.wordpress.com)

Ilya Sutskever created a new AGI startup

harfeJun 19, 2024, 5:17 PM

95 points

35 comments1 min readLW link

(ssi.inc)

Towards a Less Bullshit Model of Semantics

johnswentworth and David Lorell

Jun 17, 2024, 3:51 PM

94 points

44 comments21 min readLW link

Takeoff speeds presentation at Anthropic

Tom DavidsonJun 4, 2024, 10:46 PM

92 points

0 comments25 min readLW link

Just admit that you’ve zoned out

joecJun 4, 2024, 2:51 AM

91 points

22 comments2 min readLW link

Detecting Genetically Engineered Viruses With Metagenomic Sequencing

jefftkJun 27, 2024, 2:01 PM

87 points

10 comments LW link

(naobservatory.org)

I’m a bit skeptical of AlphaFold 3

Oleg TrottJun 25, 2024, 12:04 AM

87 points

14 comments2 min readLW link

[Paper] Stress-testing capability elicitation with password-locked models

Fabien Roger and ryan_greenblatt

Jun 4, 2024, 2:52 PM

85 points

10 comments12 min readLW link

(arxiv.org)

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der Weij, Felix Hofstätter, Ollie J, Sam F. Brown and Francis Rhys Ward

Jun 13, 2024, 10:04 AM

84 points

10 comments2 min readLW link

(arxiv.org)

Actually, Power Plants May Be an AI Training Bottleneck.

Lao MeinJun 20, 2024, 4:41 AM

83 points

13 comments2 min readLW link

Secondary forces of debt

KatjaGraceJun 27, 2024, 9:10 PM

81 points

18 comments2 min readLW link

(worldspiritsockpuppet.com)

AI takeoff and nuclear war

owencbJun 11, 2024, 7:36 PM

80 points

6 comments11 min readLW link

(strangecities.substack.com)

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer