Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Page
2
Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals
johnswentworth
and
David Lorell
Jan 24, 2025, 8:20 PM
180
points
61
comments
5
min read
LW
link
Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall
Vladimir_Nesov
May 1, 2025, 1:54 PM
172
points
22
comments
5
min read
LW
link
So how well is Claude playing Pokémon?
Julian Bradshaw
Mar 7, 2025, 5:54 AM
171
points
74
comments
5
min read
LW
link
How will we update about scheming?
ryan_greenblatt
Jan 6, 2025, 8:21 PM
171
points
20
comments
37
min read
LW
link
Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala
Apr 15, 2025, 3:56 PM
168
points
50
comments
18
min read
LW
link
On the Rationality of Deterring ASI
Dan H
Mar 5, 2025, 4:11 PM
166
points
34
comments
4
min read
LW
link
(nationalsecurity.ai)
Short Timelines Don’t Devalue Long Horizon Research
Vladimir_Nesov
Apr 9, 2025, 12:42 AM
166
points
24
comments
1
min read
LW
link
Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit
,
Raymond Douglas
,
Nora_Ammann
,
Deger Turan
,
David Scott Krueger (formerly: capybaralet)
and
David Duvenaud
Jan 30, 2025, 5:03 PM
162
points
58
comments
2
min read
LW
link
(gradual-disempowerment.ai)
Maximizing Communication, not Traffic
jefftk
Jan 5, 2025, 1:00 PM
161
points
10
comments
1
min read
LW
link
(www.jefftk.com)
[Question]
Have LLMs Generated Novel Insights?
abramdemski
and
Cole Wyeth
Feb 23, 2025, 6:22 PM
158
points
38
comments
2
min read
LW
link
I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
shrimpy
Mar 16, 2025, 4:52 PM
157
points
25
comments
1
min read
LW
link
Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu
,
Diogo de Lucena
,
Gunnar_Zarncke
,
Judd Rosenblatt
,
Cameron Berg
,
Mike Vaiana
and
AE Studio
Mar 13, 2025, 7:09 PM
155
points
40
comments
6
min read
LW
link
Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen
Apr 14, 2025, 5:38 PM
154
points
42
comments
7
min read
LW
link
(adamkarvonen.github.io)
Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout
Mar 2, 2025, 7:51 PM
154
points
27
comments
1
min read
LW
link
(turntrout.com)
Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard
Mar 2, 2025, 8:26 PM
154
points
26
comments
9
min read
LW
link
It’s been ten years. I propose HPMOR Anniversary Parties.
Screwtape
Feb 16, 2025, 1:43 AM
153
points
3
comments
1
min read
LW
link
Don’t ignore bad vibes you get from people
Kaj_Sotala
Jan 18, 2025, 9:20 AM
152
points
50
comments
2
min read
LW
link
(kajsotala.fi)
OpenAI #10: Reflections
Zvi
Jan 7, 2025, 5:00 PM
149
points
7
comments
11
min read
LW
link
(thezvi.wordpress.com)
Conceptual Rounding Errors
Jan_Kulveit
Mar 26, 2025, 7:00 PM
149
points
15
comments
3
min read
LW
link
(boundedlyrational.substack.com)
Capital Ownership Will Not Prevent Human Disempowerment
beren
Jan 5, 2025, 6:00 AM
149
points
18
comments
14
min read
LW
link
Quotes from the Stargate press conference
Nikola Jurkovic
Jan 22, 2025, 12:50 AM
149
points
7
comments
1
min read
LW
link
(www.c-span.org)
Methods for strong human germline engineering
TsviBT
Mar 3, 2025, 8:13 AM
149
points
28
comments
108
min read
LW
link
A computational no-coincidence principle
Eric Neyman
Feb 14, 2025, 9:39 PM
148
points
38
comments
6
min read
LW
link
(www.alignment.org)
Levels of Friction
Zvi
Feb 10, 2025, 1:10 PM
148
points
8
comments
12
min read
LW
link
(thezvi.wordpress.com)
Winning the power to lose
KatjaGrace
May 20, 2025, 6:40 AM
148
points
37
comments
2
min read
LW
link
(worldspiritsockpuppet.com)
Activation space interpretability may be doomed
bilalchughtai
and
Lucius Bushnaq
Jan 8, 2025, 12:49 PM
148
points
33
comments
8
min read
LW
link
The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
Thane Ruthenis
Feb 21, 2025, 8:15 PM
148
points
51
comments
6
min read
LW
link
Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
John Hughes
,
abhayesian
,
Akbir Khan
and
Fabien Roger
Apr 8, 2025, 5:32 PM
146
points
20
comments
12
min read
LW
link
AI companies are unlikely to make high-assurance safety cases if timelines are short
ryan_greenblatt
Jan 23, 2025, 6:41 PM
145
points
5
comments
13
min read
LW
link
Applying traditional economic thinking to AGI: a trilemma
Steven Byrnes
Jan 13, 2025, 1:23 AM
144
points
32
comments
3
min read
LW
link
The Most Forbidden Technique
Zvi
Mar 12, 2025, 1:20 PM
143
points
9
comments
17
min read
LW
link
(thezvi.wordpress.com)
The Hidden Cost of Our Lies to AI
Nicholas Andresen
Mar 6, 2025, 5:03 AM
142
points
18
comments
7
min read
LW
link
(substack.com)
Auditing language models for hidden objectives
Sam Marks
,
Johannes Treutlein
,
dmz
,
Sam Bowman
,
Hoagy
,
Carson Denison
,
Kei
,
7vik
,
Akbir Khan
,
Austin Meek
,
Euan Ong
,
Christopher Olah
,
Fabien Roger
,
jeanne_
,
Meg
,
Drake Thomas
,
Adam Jermyn
,
Monte M
and
evhub
Mar 13, 2025, 7:18 PM
141
points
15
comments
13
min read
LW
link
Human takeover might be worse than AI takeover
Tom Davidson
Jan 10, 2025, 4:53 PM
141
points
55
comments
8
min read
LW
link
OpenAI #12: Battle of the Board Redux
Zvi
Mar 31, 2025, 3:50 PM
141
points
1
comment
9
min read
LW
link
(thezvi.wordpress.com)
Ten people on the inside
Buck
Jan 28, 2025, 4:41 PM
139
points
28
comments
4
min read
LW
link
Training AGI in Secret would be Unsafe and Unethical
Daniel Kokotajlo
Apr 18, 2025, 12:27 PM
139
points
15
comments
6
min read
LW
link
What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman
Jan 6, 2025, 7:57 PM
139
points
57
comments
13
min read
LW
link
Planning for Extreme AI Risks
joshc
Jan 29, 2025, 6:33 PM
139
points
5
comments
16
min read
LW
link
[Question]
How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Thane Ruthenis
Mar 4, 2025, 4:23 PM
137
points
51
comments
3
min read
LW
link
[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty
tandem
Jan 7, 2025, 7:11 PM
137
points
5
comments
1
min read
LW
link
The Failed Strategy of Artificial Intelligence Doomers
Ben Pace
Jan 31, 2025, 6:56 PM
136
points
78
comments
5
min read
LW
link
(www.palladiummag.com)
Anomalous Tokens in DeepSeek-V3 and r1
henry
Jan 25, 2025, 10:55 PM
136
points
3
comments
7
min read
LW
link
The Milton Friedman Model of Policy Change
JohnofCharleston
Mar 4, 2025, 12:38 AM
136
points
17
comments
4
min read
LW
link
Training on Documents About Reward Hacking Induces Reward Hacking
evhub
and
Nathan Hu
Jan 21, 2025, 9:32 PM
131
points
15
comments
2
min read
LW
link
(alignment.anthropic.com)
AI Doomerism in 1879
David Gross
May 13, 2025, 2:48 AM
131
points
36
comments
8
min read
LW
link
It’s Okay to Feel Bad for a Bit
moridinamael
May 10, 2025, 11:24 PM
131
points
26
comments
3
min read
LW
link
Tell me about yourself: LLMs are aware of their learned behaviors
Martín Soto
and
Owain_Evans
Jan 22, 2025, 12:47 AM
130
points
5
comments
6
min read
LW
link
Building AI Research Fleets
Ben Goldhaber
and
Jesse Hoogland
Jan 12, 2025, 6:23 PM
130
points
11
comments
5
min read
LW
link
Consider not donating under $100 to political candidates
DanielFilan
May 11, 2025, 3:20 AM
130
points
31
comments
1
min read
LW
link
(danielfilan.com)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel