Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo
,
Thomas Larsen
,
elifland
,
Scott Alexander
,
Jonas V
and
romeo
Apr 3, 2025, 4:23 PM
635
points
211
comments
41
min read
LW
link
(ai-2027.com)
Accountability Sinks
Martin Sustrik
Apr 22, 2025, 5:00 AM
368
points
53
comments
15
min read
LW
link
(250bpm.substack.com)
LessWrong has been acquired by EA
habryka
Apr 1, 2025, 1:09 PM
343
points
47
comments
1
min read
LW
link
VDT: a solution to decision theory
L Rudolf L
Apr 1, 2025, 9:04 PM
337
points
26
comments
4
min read
LW
link
Playing in the Creek
Hastings
Apr 10, 2025, 5:39 PM
307
points
6
comments
2
min read
LW
link
(hgreer.com)
Why Have Sentence Lengths Decreased?
Arjun Panickssery
Apr 3, 2025, 5:50 PM
266
points
89
comments
4
min read
LW
link
(arjunpanickssery.substack.com)
Why Should I Assume CCP AGI is Worse Than USG AGI?
Tomás B.
Apr 19, 2025, 2:47 PM
242
points
83
comments
1
min read
LW
link
To Understand History, Keep Former Population Distributions In Mind
Arjun Panickssery
Apr 23, 2025, 4:51 AM
224
points
13
comments
2
min read
LW
link
(arjunpanickssery.substack.com)
Jaan Tallinn’s 2024 Philanthropy Overview
jaan
Apr 23, 2025, 11:06 AM
221
points
8
comments
1
min read
LW
link
(jaan.info)
Thoughts on AI 2027
Max Harms
Apr 9, 2025, 9:26 PM
215
points
49
comments
21
min read
LW
link
(intelligence.org)
Impact, agency, and taste
benkuhn
Apr 19, 2025, 9:10 PM
201
points
10
comments
8
min read
LW
link
(www.benkuhn.net)
Short Timelines Don’t Devalue Long Horizon Research
Vladimir_Nesov
Apr 9, 2025, 12:42 AM
165
points
23
comments
1
min read
LW
link
Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala
Apr 15, 2025, 3:56 PM
163
points
48
comments
18
min read
LW
link
Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen
Apr 14, 2025, 5:38 PM
147
points
42
comments
7
min read
LW
link
(adamkarvonen.github.io)
Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
John Hughes
,
abhayesian
,
Akbir Khan
and
Fabien Roger
Apr 8, 2025, 5:32 PM
145
points
20
comments
12
min read
LW
link
Training AGI in Secret would be Unsafe and Unethical
Daniel Kokotajlo
Apr 18, 2025, 12:27 PM
137
points
15
comments
6
min read
LW
link
AI-enabled coups: a small group could use AI to seize power
Tom Davidson
,
Lukas Finnveden
and
rosehadshar
Apr 16, 2025, 4:51 PM
128
points
18
comments
7
min read
LW
link
AI 2027 is a Bet Against Amdahl’s Law
snewman
Apr 21, 2025, 3:09 AM
123
points
54
comments
9
min read
LW
link
Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt
,
Buck
,
Adam Kaufman
,
Cody Rushing
and
Tyler Tracy
Apr 16, 2025, 4:21 PM
122
points
0
comments
20
min read
LW
link
Learned pain as a leading cause of chronic pain
SoerenMind
Apr 9, 2025, 11:57 AM
122
points
13
comments
9
min read
LW
link
Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red
Julian Bradshaw
Apr 21, 2025, 3:52 AM
118
points
19
comments
14
min read
LW
link
“The Era of Experience” has an unsolved technical alignment problem
Steven Byrnes
Apr 24, 2025, 1:57 PM
114
points
42
comments
23
min read
LW
link
Three Months In, Evaluating Three Rationalist Cases for Trump
Arjun Panickssery
Apr 18, 2025, 8:27 AM
114
points
32
comments
4
min read
LW
link
Among Us: A Sandbox for Agentic Deception
7vik
and
Adrià Garriga-alonso
Apr 5, 2025, 6:24 AM
110
points
7
comments
7
min read
LW
link
New Cause Area Proposal
CallumMcDougall
Apr 1, 2025, 7:12 AM
108
points
4
comments
1
min read
LW
link
We should try to automate AI safety work asap
Marius Hobbhahn
Apr 26, 2025, 4:35 PM
106
points
10
comments
15
min read
LW
link
AI 2027: Responses
Zvi
Apr 8, 2025, 12:50 PM
106
points
3
comments
30
min read
LW
link
(thezvi.wordpress.com)
How training-gamers might function (and win)
Vivek Hebbar
Apr 11, 2025, 9:26 PM
105
points
5
comments
13
min read
LW
link
Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan
and
eggsyntax
Apr 2, 2025, 8:51 AM
103
points
41
comments
3
min read
LW
link
The Lizardman and the Black Hat Bobcat
Screwtape
Apr 6, 2025, 7:02 PM
96
points
13
comments
9
min read
LW
link
How to Build a Third Place on Focusmate
Parker Conley
Apr 28, 2025, 11:46 PM
92
points
3
comments
5
min read
LW
link
(parconley.com)
ASI existential risk: Reconsidering Alignment as a Goal
habryka
Apr 15, 2025, 7:57 PM
91
points
14
comments
19
min read
LW
link
(michaelnotebook.com)
How To Believe False Things
Eneasz
Apr 2, 2025, 4:28 PM
89
points
10
comments
3
min read
LW
link
One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky
Apr 14, 2025, 6:40 AM
88
points
6
comments
11
min read
LW
link
Is Gemini now better than Claude at Pokémon?
Julian Bradshaw
Apr 19, 2025, 11:34 PM
88
points
12
comments
5
min read
LW
link
The Uses of Complacency
sarahconstantin
Apr 21, 2025, 6:50 PM
86
points
5
comments
8
min read
LW
link
(sarahconstantin.substack.com)
o3 Is a Lying Liar
Zvi
Apr 23, 2025, 8:00 PM
84
points
19
comments
9
min read
LW
link
(thezvi.wordpress.com)
Misrepresentation as a Barrier for Interp (Part I)
johnswentworth
and
Steve Petersen
Apr 29, 2025, 5:07 PM
84
points
9
comments
7
min read
LW
link
A Slow Guide to Confronting Doom
Ruby
Apr 6, 2025, 2:10 AM
83
points
20
comments
14
min read
LW
link
$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?
johnswentworth
and
David Lorell
Apr 21, 2025, 8:19 PM
83
points
12
comments
3
min read
LW
link
7+ tractable directions in AI control
Julian Stastny
and
ryan_greenblatt
Apr 28, 2025, 5:12 PM
82
points
1
comment
13
min read
LW
link
Keltham’s Lectures in Project Lawful
Morpheus
Apr 1, 2025, 10:39 AM
81
points
5
comments
2
min read
LW
link
You will crash your car in front of my house within the next week
Richard Korzekwa
Apr 1, 2025, 9:43 PM
80
points
6
comments
1
min read
LW
link
What Makes an AI Startup “Net Positive” for Safety?
jacquesthibs
Apr 18, 2025, 8:33 PM
80
points
23
comments
2
min read
LW
link
Announcing ILIAD2: ODYSSEY
Alexander Gietelink Oldenziel
and
windows
Apr 3, 2025, 5:01 PM
80
points
1
comment
1
min read
LW
link
Bandwidth Rules Everything Around Me: Oliver Habryka on OpenPhil and GoodVentures
Elizabeth
Apr 29, 2025, 8:40 PM
78
points
15
comments
1
min read
LW
link
(acesounderglass.com)
Why does LW not put much more focus on AI governance and outreach?
Severin T. Seehrich
and
Benjamin Schmidt
Apr 12, 2025, 2:24 PM
78
points
31
comments
2
min read
LW
link
New Paper: Infra-Bayesian Decision-Estimation Theory
Vanessa Kosoy
and
Diffractor
Apr 10, 2025, 9:17 AM
77
points
4
comments
1
min read
LW
link
(arxiv.org)
PauseAI and E/Acc Should Switch Sides
WillPetillo
Apr 1, 2025, 11:25 PM
76
points
6
comments
2
min read
LW
link
Reward hacking is becoming more sophisticated and deliberate in frontier LLMs
Kei
Apr 24, 2025, 4:03 PM
76
points
6
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel