All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

There is way too much serendipity

MalmesburyJan 19, 2024, 7:37 PM

377 points

56 comments7 min readLW link

Gentleness and the artificial Other

Joe CarlsmithJan 2, 2024, 6:21 PM

313 points

33 comments11 min readLW link

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

evhub, Carson Denison, Meg, Monte M, David Duvenaud, Nicholas Schiefer and Ethan Perez

Jan 12, 2024, 7:51 PM

305 points

95 comments3 min readLW link

(arxiv.org)

The case for ensuring that powerful AIs are controlled

ryan_greenblatt and Buck

Jan 24, 2024, 4:11 PM

276 points

73 comments28 min readLW link

MIRI 2024 Mission and Strategy Update

MaloJan 5, 2024, 12:20 AM

223 points

44 comments8 min readLW link

Toward A Mathematical Framework for Computation in Superposition

Dmitry Vaintrob, jake_mendel and Kaarel

Jan 18, 2024, 9:06 PM

205 points

18 comments63 min readLW link

The impossible problem of due process

mingyuanJan 16, 2024, 5:18 AM

197 points

64 comments14 min readLW link

This might be the last AI Safety Camp

Remmelt and Linda Linsefors

Jan 24, 2024, 9:33 AM

196 points

34 comments1 min readLW link

Introducing Alignment Stress-Testing at Anthropic

evhubJan 12, 2024, 11:51 PM

182 points

23 comments2 min readLW link

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI

Jeremy Gillen and peterbarnett

Jan 26, 2024, 7:22 AM

161 points

60 comments57 min readLW link

Making every researcher seek grants is a broken model

jasoncrawfordJan 26, 2024, 4:06 PM

159 points

41 comments4 min readLW link

(rootsofprogress.org)

What’s up with LLMs representing XORs of arbitrary features?

Sam MarksJan 3, 2024, 7:44 PM

158 points

63 comments16 min readLW link

Apologizing is a Core Rationalist Skill

johnswentworthJan 2, 2024, 5:47 PM

156 points

42 comments5 min readLW link

Deep atheism and AI risk

Joe CarlsmithJan 4, 2024, 6:58 PM

153 points

22 comments27 min readLW link

What good is G-factor if you’re dumped in the woods? A field report from a camp counselor.

HastingsJan 12, 2024, 1:17 PM

149 points

22 comments1 min readLW link

Notice When People Are Directionally Correct

Chris_LeongJan 14, 2024, 2:12 PM

136 points

8 comments2 min readLW link

Processor clock speeds are not how fast AIs think

Ege ErdilJan 29, 2024, 2:39 PM

135 points

55 comments2 min readLW link

The case for training frontier AIs on Sumerian-only corpus

Alexandre Variengien, Charbel-Raphaël and Jonathan Claybrough

Jan 15, 2024, 4:40 PM

130 points

16 comments3 min readLW link

Steering Llama-2 with contrastive activation additions

Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub and TurnTrout

Jan 2, 2024, 12:47 AM

125 points

29 comments8 min readLW link

(arxiv.org)

A Shutdown Problem Proposal

johnswentworth and David Lorell

Jan 21, 2024, 6:12 PM

125 points

61 comments6 min readLW link

An even deeper atheism

Joe CarlsmithJan 11, 2024, 5:28 PM

125 points

47 comments15 min readLW link

Why I take short timelines seriously

NicholasKeesJan 28, 2024, 10:27 PM

122 points

29 comments4 min readLW link

Gender Exploration

sapphireJan 14, 2024, 6:57 PM

117 points

26 comments5 min readLW link

(open.substack.com)

The case for more ambitious language model evals

JozdienJan 30, 2024, 12:01 AM

117 points

30 comments5 min readLW link

Four visions of Transformative AI success

Steven ByrnesJan 17, 2024, 8:45 PM

112 points

22 comments15 min readLW link

Practically A Book Review: Appendix to “Nonlinear’s Evidence: Debunking False and Misleading Claims” (ThingOfThings)

tailcalledJan 3, 2024, 5:07 PM

111 points

25 comments2 min readLW link

(thingofthings.substack.com)

Catching AIs red-handed

ryan_greenblatt and Buck

Jan 5, 2024, 5:43 PM

111 points

27 comments17 min readLW link

Being nicer than Clippy

Joe CarlsmithJan 16, 2024, 7:44 PM

109 points

32 comments27 min readLW link

′ petertodd’’s last stand: The final days of open GPT-3 research

mwatkinsJan 22, 2024, 6:47 PM

109 points

16 comments45 min readLW link

2023 in AI predictions

jessicataJan 1, 2024, 5:23 AM

107 points

35 comments5 min readLW link

Almost everyone I’ve met would be well-served thinking more about what to focus on

Henrik KarlssonJan 5, 2024, 9:01 PM

96 points

8 comments11 min readLW link

(www.henrikkarlsson.xyz)

Deceptive AI ≠ Deceptively-aligned AI

Steven ByrnesJan 7, 2024, 4:55 PM

96 points

19 comments6 min readLW link

RAND report finds no effect of current LLMs on viability of bioterrorism attacks

StellaAthenaJan 25, 2024, 7:17 PM

94 points

14 comments1 min readLW link

(www.rand.org)

On the abolition of man

Joe CarlsmithJan 18, 2024, 6:17 PM

90 points

18 comments41 min readLW link

The Aspiring Rationalist Congregation

maiaJan 10, 2024, 10:52 PM

86 points

23 comments10 min readLW link

Sparse Autoencoders Work on Attention Layer Outputs

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

Jan 16, 2024, 12:26 AM

84 points

9 comments18 min readLW link

Some Vacation Photos

johnswentworthJan 4, 2024, 5:15 PM

83 points

0 comments1 min readLW link

Palworld development blog post

bhauthJan 28, 2024, 5:56 AM

82 points

12 comments1 min readLW link

(note.com)

An Introduction To The Mandelbrot Set That Doesn’t Mention Complex Numbers

YitzJan 17, 2024, 9:48 AM

82 points

11 comments9 min readLW link

Survey of 2,778 AI authors: six parts in pictures

KatjaGraceJan 6, 2024, 4:43 AM

80 points

1 comment2 min readLW link

[Repost] The Copenhagen Interpretation of Ethics

mesaoptimizerJan 25, 2024, 3:20 PM

77 points

4 comments5 min readLW link

(web.archive.org)

Universal Love Integration Test: Hitler

RaemonJan 10, 2024, 11:55 PM

76 points

65 comments9 min readLW link

When “yang” goes wrong

Joe CarlsmithJan 8, 2024, 4:35 PM

73 points

6 comments13 min readLW link

Epistemic Hell

rogersbaconJan 27, 2024, 5:13 PM

71 points

20 comments14 min readLW link

We need a Science of Evals

Marius Hobbhahn and Jérémy Scheurer

Jan 22, 2024, 8:30 PM

71 points

13 comments9 min readLW link

The True Story of How GPT-2 Became Maximally Lewd

Jan 18, 2024, 9:03 PM

70 points

7 comments6 min readLW link

(youtu.be)

InterLab – a toolkit for experiments with multi-agent interactions

Tomáš Gavenčiak, Ada Böhm and Jan_Kulveit

Jan 22, 2024, 6:23 PM

69 points

0 comments8 min readLW link

(acsresearch.org)

Bayesian updating in real life is mostly about understanding your hypotheses

Max HJan 1, 2024, 12:10 AM

68 points

4 comments11 min readLW link

[Question] Will quantum randomness affect the 2028 election?

Thomas Kwa and habryka

Jan 24, 2024, 10:54 PM

66 points

52 comments1 min readLW link

OpenAI’s Preparedness Framework: Praise & Recommendations

Orpheus16Jan 2, 2024, 4:20 PM

66 points

1 comment7 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer