Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Page
3
My AI Model Delta Compared To Christiano
johnswentworth
Jun 12, 2024, 6:19 PM
191
points
73
comments
4
min read
LW
link
What’s Going on With OpenAI’s Messaging?
ozziegooen
May 21, 2024, 2:22 AM
191
points
13
comments
LW
link
Two easy things that maybe Just Work to improve AI discourse
Bird Concept
Jun 8, 2024, 3:51 PM
190
points
35
comments
2
min read
LW
link
OMMC Announces RIP
Adam Scholl
and
aysja
Apr 1, 2024, 11:20 PM
189
points
5
comments
2
min read
LW
link
A basic systems architecture for AI agents that do autonomous research
Buck
Sep 23, 2024, 1:58 PM
189
points
16
comments
8
min read
LW
link
My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis
Mar 26, 2024, 5:18 PM
189
points
187
comments
6
min read
LW
link
Shallow review of technical AI safety, 2024
technicalities
,
Stag
,
Stephen McAleese
,
jordine
and
Dr. David Mathers
Dec 29, 2024, 12:01 PM
189
points
34
comments
41
min read
LW
link
On Not Pulling The Ladder Up Behind You
Screwtape
Apr 26, 2024, 9:58 PM
188
points
21
comments
9
min read
LW
link
Skills from a year of Purposeful Rationality Practice
Raemon
Sep 18, 2024, 2:05 AM
187
points
18
comments
7
min read
LW
link
Information vs Assurance
johnswentworth
Oct 20, 2024, 11:16 PM
187
points
17
comments
2
min read
LW
link
Daniel Kahneman has died
DanielFilan
Mar 27, 2024, 3:59 PM
186
points
11
comments
1
min read
LW
link
(www.washingtonpost.com)
Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen
Feb 22, 2024, 11:56 PM
186
points
5
comments
4
min read
LW
link
(bayesshammai.substack.com)
This is already your second chance
Malmesbury
Jul 28, 2024, 5:13 PM
185
points
13
comments
8
min read
LW
link
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth
and
David Lorell
Apr 18, 2024, 12:27 AM
185
points
21
comments
7
min read
LW
link
Humming is not a free $100 bill
Elizabeth
Jun 6, 2024, 8:10 PM
185
points
6
comments
3
min read
LW
link
(acesounderglass.com)
Struggling like a Shadowmoth
Raemon
Sep 24, 2024, 12:47 AM
183
points
38
comments
7
min read
LW
link
Introducing Alignment Stress-Testing at Anthropic
evhub
Jan 12, 2024, 11:51 PM
182
points
23
comments
2
min read
LW
link
Contra papers claiming superhuman AI forecasting
nikos
,
Peter Mühlbacher
,
Lawrence Phillips
and
dschwarz
Sep 12, 2024, 6:10 PM
182
points
16
comments
7
min read
LW
link
Every “Every Bay Area House Party” Bay Area House Party
Richard_Ngo
Feb 16, 2024, 6:53 PM
181
points
6
comments
4
min read
LW
link
Safety consultations for AI lab employees
Zach Stein-Perlman
Jul 27, 2024, 3:00 PM
181
points
4
comments
1
min read
LW
link
[Question]
Why is o1 so deceptive?
abramdemski
Sep 27, 2024, 5:27 PM
180
points
24
comments
3
min read
LW
link
My motivation and theory of change for working in AI healthtech
Andrew_Critch
Oct 12, 2024, 12:36 AM
178
points
37
comments
14
min read
LW
link
Toward a Broader Conception of Adverse Selection
Ricki Heicklen
Mar 14, 2024, 10:40 PM
177
points
61
comments
13
min read
LW
link
(bayesshammai.substack.com)
FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern
Apr 17, 2024, 1:54 PM
176
points
22
comments
1
min read
LW
link
(www.futureofhumanityinstitute.org)
WTH is Cerebrolysin, actually?
gsfitzgerald
and
delton137
Aug 6, 2024, 8:40 PM
175
points
23
comments
17
min read
LW
link
When Is Insurance Worth It?
kqr
Dec 19, 2024, 7:07 PM
173
points
71
comments
4
min read
LW
link
(entropicthoughts.com)
Timaeus’s First Four Months
Jesse Hoogland
,
Daniel Murfet
,
Stan van Wingerden
and
Alexander Gietelink Oldenziel
Feb 28, 2024, 5:01 PM
173
points
6
comments
6
min read
LW
link
Three Subtle Examples of Data Leakage
abstractapplic
Oct 1, 2024, 8:45 PM
172
points
16
comments
4
min read
LW
link
Did Christopher Hitchens change his mind about waterboarding?
Isaac King
Sep 15, 2024, 8:28 AM
171
points
22
comments
7
min read
LW
link
‘Empiricism!’ as Anti-Epistemology
Eliezer Yudkowsky
Mar 14, 2024, 2:02 AM
171
points
92
comments
25
min read
LW
link
Reconsider the anti-cavity bacteria if you are Asian
Lao Mein
Apr 15, 2024, 7:02 AM
170
points
43
comments
4
min read
LW
link
o1: A Technical Primer
Jesse Hoogland
Dec 9, 2024, 7:09 PM
170
points
19
comments
9
min read
LW
link
(www.youtube.com)
And All the Shoggoths Merely Players
Zack_M_Davis
Feb 10, 2024, 7:56 PM
170
points
57
comments
12
min read
LW
link
Overcoming Bias Anthology
Arjun Panickssery
Oct 20, 2024, 2:01 AM
169
points
14
comments
2
min read
LW
link
(overcoming-bias-anthology.com)
Recommendation: reports on the search for missing hiker Bill Ewasko
eukaryote
Jul 31, 2024, 10:15 PM
169
points
28
comments
14
min read
LW
link
(eukaryotewritesblog.com)
Masterpiece
Richard_Ngo
Feb 13, 2024, 11:10 PM
166
points
21
comments
4
min read
LW
link
(www.narrativeark.xyz)
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud
,
Jacob G-W
,
Evzen
,
Joseph Miller
and
TurnTrout
6 Dec 2024 22:19 UTC
165
points
12
comments
11
min read
LW
link
(arxiv.org)
You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex
8 Aug 2024 18:33 UTC
165
points
11
comments
8
min read
LW
link
Boycott OpenAI
PeterMcCluskey
18 Jun 2024 19:52 UTC
164
points
26
comments
1
min read
LW
link
(bayesianinvestor.com)
The Summoned Heroine’s Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic
26 Oct 2024 12:34 UTC
164
points
16
comments
7
min read
LW
link
Tips for Empirical Alignment Research
Ethan Perez
29 Feb 2024 6:04 UTC
163
points
4
comments
23
min read
LW
link
Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein
and
Owain_Evans
21 Jun 2024 15:54 UTC
163
points
13
comments
8
min read
LW
link
(arxiv.org)
Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann
and
Alexander Gietelink Oldenziel
5 Jun 2024 9:37 UTC
163
points
18
comments
2
min read
LW
link
Many arguments for AI x-risk are wrong
TurnTrout
5 Mar 2024 2:31 UTC
162
points
87
comments
12
min read
LW
link
The Median Researcher Problem
johnswentworth
2 Nov 2024 20:16 UTC
161
points
70
comments
1
min read
LW
link
o1 is a bad idea
abramdemski
11 Nov 2024 21:20 UTC
161
points
39
comments
2
min read
LW
link
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen
and
peterbarnett
26 Jan 2024 7:22 UTC
161
points
60
comments
57
min read
LW
link
Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison
and
evhub
17 Jun 2024 18:41 UTC
161
points
22
comments
8
min read
LW
link
(arxiv.org)
Making every researcher seek grants is a broken model
jasoncrawford
26 Jan 2024 16:06 UTC
159
points
41
comments
4
min read
LW
link
(rootsofprogress.org)
DeepMind’s “Frontier Safety Framework” is weak and unambitious
Zach Stein-Perlman
18 May 2024 3:00 UTC
159
points
14
comments
4
min read
LW
link
Back to first
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel