All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

(My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen and elifland

Aug 29, 2022, 1:23 AM

413 points

90 comments37 min readLW link 1 review

DeepMind alignment team opinions on AGI ruin arguments

VikaAug 12, 2022, 9:06 PM

395 points

37 comments14 min readLW link 1 review

A Mechanistic Interpretability Analysis of Grokking

Neel Nanda and Tom Lieberum

Aug 15, 2022, 2:41 AM

373 points

48 comments36 min readLW link 1 review

(colab.research.google.com)

Two-year update on my personal AI timelines

Ajeya CotraAug 2, 2022, 11:07 PM

293 points

60 comments16 min readLW link

Common misconceptions about OpenAI

Jacob_HiltonAug 25, 2022, 2:02 PM

237 points

154 comments5 min readLW link 1 review

What do ML researchers think about AI in 2022?

KatjaGraceAug 4, 2022, 3:40 PM

221 points

33 comments3 min readLW link

(aiimpacts.org)

How To Go From Interpretability To Alignment: Just Retarget The Search

johnswentworthAug 10, 2022, 4:08 PM

209 points

34 comments3 min readLW link 1 review

Worlds Where Iterative Design Fails

johnswentworthAug 30, 2022, 8:48 PM

208 points

30 comments10 min readLW link 1 review

Language models seem to be much better than humans at next-token prediction

Buck, Fabien Roger and LawrenceC

Aug 11, 2022, 5:45 PM

182 points

60 comments13 min readLW link 1 review

Some conceptual alignment research projects

Richard_NgoAug 25, 2022, 10:51 PM

177 points

15 comments3 min readLW link

Shard Theory: An Overview

David UdellAug 11, 2022, 5:44 AM

166 points

34 comments10 min readLW link

What’s General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?

johnswentworthAug 15, 2022, 10:48 PM

156 points

18 comments10 min readLW link

Your posts should be on arXiv

JanBAug 25, 2022, 10:35 AM

156 points

44 comments3 min readLW link

Nate Soares’ Life Advice

CatGoddessAug 23, 2022, 2:46 AM

153 points

41 comments3 min readLW link

Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworthAug 8, 2022, 6:05 PM

143 points

13 comments3 min readLW link

How might we align transformative AI if it’s developed very soon?

HoldenKarnofskyAug 29, 2022, 3:42 PM

140 points

55 comments45 min readLW link 1 review

The Parable of the Boy Who Cried 5% Chance of Wolf

KatWoodsAug 15, 2022, 2:33 PM

140 points

24 comments2 min readLW link

Externalized reasoning oversight: a research direction for language model alignment

tameraAug 3, 2022, 12:03 PM

135 points

23 comments6 min readLW link

Fiber arts, mysterious dodecahedrons, and waiting on “Eureka!”

eukaryoteAug 4, 2022, 8:37 PM

124 points

15 comments9 min readLW link 1 review

(eukaryotewritesblog.com)

Taking the parameters which seem to matter and rotating them until they don’t

Garrett BakerAug 26, 2022, 6:26 PM

120 points

48 comments1 min readLW link

Meditation course claims 65% enlightenment rate: my review

KatWoodsAug 1, 2022, 11:25 AM

111 points

35 comments14 min readLW link

Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworthAug 12, 2022, 4:30 PM

110 points

49 comments1 min readLW link

The lessons of Xanadu

jasoncrawfordAug 7, 2022, 5:59 PM

110 points

20 comments8 min readLW link

(jasoncrawford.org)

Beliefs and Disagreements about Automating Alignment Research

Ian McKenzieAug 24, 2022, 6:37 PM

107 points

4 comments7 min readLW link

The alignment problem from a deep learning perspective

Richard_NgoAug 10, 2022, 10:46 PM

107 points

15 comments27 min readLW link 1 review

How likely is deceptive alignment?

evhubAug 30, 2022, 7:34 PM

104 points

28 comments60 min readLW link

Announcing Encultured AI: Building a Video Game

Andrew_Critch and Nick Hay

Aug 18, 2022, 2:16 AM

103 points

26 comments4 min readLW link

Rant on Problem Factorization for Alignment

johnswentworthAug 5, 2022, 7:23 PM

102 points

53 comments6 min readLW link

Introducing Pastcasting: A tool for forecasting practice

Sage FutureAug 11, 2022, 5:38 PM

95 points

10 comments2 min readLW link 2 reviews

Survey advice

KatjaGraceAug 24, 2022, 3:10 AM

93 points

11 comments3 min readLW link

(worldspiritsockpuppet.com)

How to do theoretical research, a personal perspective

Mark XuAug 19, 2022, 7:41 PM

91 points

6 comments15 min readLW link

Less Threat-Dependent Bargaining Solutions?? (3/2)

DiffractorAug 20, 2022, 2:19 AM

88 points

7 comments6 min readLW link

[Question] Seriously, what goes wrong with “reward the agent when it makes you smile”?

TurnTroutAug 11, 2022, 10:22 PM

87 points

43 comments2 min readLW link

High Reliability Orgs, and AI Companies

RaemonAug 4, 2022, 5:45 AM

86 points

7 comments12 min readLW link 1 review

Refining the Sharp Left Turn threat model, part 1: claims and mechanisms

Vika, Vikrant Varma, Ramana Kumar and Mary Phuong

Aug 12, 2022, 3:17 PM

86 points

4 comments3 min readLW link 1 review

(vkrakovna.wordpress.com)

I’m mildly skeptical that blindness prevents schizophrenia

Steven ByrnesAug 15, 2022, 11:36 PM

83 points

9 comments4 min readLW link

Most Ivy-smart students aren’t at Ivy-tier schools

Aaron BergmanAug 7, 2022, 3:18 AM

82 points

7 comments8 min readLW link

(www.aaronbergman.net)

Human Mimicry Mainly Works When We’re Already Close

johnswentworthAug 17, 2022, 6:41 PM

81 points

16 comments5 min readLW link

«Boundaries», Part 2: trends in EA’s handling of boundaries

Andrew_CritchAug 6, 2022, 12:42 AM

81 points

15 comments7 min readLW link

Paper is published! 100,000 lumens to treat seasonal affective disorder

FabienneAug 20, 2022, 7:48 PM

81 points

3 comments1 min readLW link

(www.lesswrong.com)

The Loire Is Not Dry

jefftkAug 20, 2022, 1:40 PM

80 points

2 comments1 min readLW link

(www.jefftk.com)

What’s the Least Impressive Thing GPT-4 Won’t be Able to Do

AlgonAug 20, 2022, 7:48 PM

80 points

125 comments1 min readLW link

Evolution is a bad analogy for AGI: inner alignment

Quintin PopeAug 13, 2022, 10:15 PM

79 points

15 comments8 min readLW link

How (not) to choose a research project

Garrett Baker, CatGoddess and Johannes C. Mayer

Aug 9, 2022, 12:26 AM

79 points

11 comments7 min readLW link

AI strategy nearcasting

HoldenKarnofskyAug 25, 2022, 5:26 PM

79 points

4 comments9 min readLW link

The Core of the Alignment Problem is...

Thomas Larsen, Jeremy Gillen and JamesH

Aug 17, 2022, 8:07 PM

76 points

10 comments9 min readLW link

Announcing the Introduction to ML Safety course

Dan H, TW123 and ozhang

Aug 6, 2022, 2:46 AM

73 points

6 comments7 min readLW link

Discovering Agents

zac_kentonAug 18, 2022, 5:33 PM

73 points

11 comments6 min readLW link

[Question] COVID-19 Group Testing Post-mortem?

gwernAug 5, 2022, 4:32 PM

72 points

6 comments2 min readLW link

$20K In Bounties for AI Safety Public Materials

Dan H, TW123 and ozhang

Aug 5, 2022, 2:52 AM

71 points

9 comments6 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer