Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
1
How to Make Superbabies
GeneSmith
and
kman
Feb 19, 2025, 8:39 PM
591
points
337
comments
31
min read
LW
link
How AI Takeover Might Happen in 2 Years
joshc
Feb 7, 2025, 5:10 PM
416
points
137
comments
29
min read
LW
link
(x.com)
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley
and
Owain_Evans
Feb 25, 2025, 5:39 PM
328
points
91
comments
4
min read
LW
link
Murder plots are infohazards
Chris Monteiro
Feb 13, 2025, 7:15 PM
300
points
44
comments
2
min read
LW
link
So You Want To Make Marginal Progress...
johnswentworth
Feb 7, 2025, 11:22 PM
284
points
42
comments
4
min read
LW
link
Arbital has been imported to LessWrong
RobertM
,
jimrandomh
,
Ben Pace
and
Ruby
Feb 20, 2025, 12:47 AM
279
points
30
comments
5
min read
LW
link
A History of the Future, 2025-2040
L Rudolf L
Feb 17, 2025, 12:03 PM
231
points
41
comments
75
min read
LW
link
(nosetgauge.substack.com)
Power Lies Trembling: a three-book review
Richard_Ngo
Feb 22, 2025, 10:57 PM
211
points
27
comments
15
min read
LW
link
(www.mindthefuture.info)
Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison
Feb 11, 2025, 12:20 AM
208
points
8
comments
LW
link
(garrisonlovely.substack.com)
Eliezer’s Lost Alignment Articles / The Arbital Sequence
Ruby
and
RobertM
Feb 20, 2025, 12:48 AM
207
points
9
comments
5
min read
LW
link
[Question]
Have LLMs Generated Novel Insights?
abramdemski
and
Cole Wyeth
Feb 23, 2025, 6:22 PM
155
points
36
comments
2
min read
LW
link
It’s been ten years. I propose HPMOR Anniversary Parties.
Screwtape
Feb 16, 2025, 1:43 AM
153
points
3
comments
1
min read
LW
link
Levels of Friction
Zvi
Feb 10, 2025, 1:10 PM
148
points
8
comments
12
min read
LW
link
(thezvi.wordpress.com)
The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
Thane Ruthenis
Feb 21, 2025, 8:15 PM
148
points
51
comments
6
min read
LW
link
A computational no-coincidence principle
Eric Neyman
Feb 14, 2025, 9:39 PM
148
points
38
comments
6
min read
LW
link
(www.alignment.org)
The Paris AI Anti-Safety Summit
Zvi
Feb 12, 2025, 2:00 PM
129
points
21
comments
21
min read
LW
link
(thezvi.wordpress.com)
Gradual Disempowerment, Shell Games and Flinches
Jan_Kulveit
Feb 2, 2025, 2:47 PM
126
points
36
comments
6
min read
LW
link
Research directions Open Phil wants to fund in technical AI safety
jake_mendel
,
maxnadeau
and
Peter Favaloro
Feb 8, 2025, 1:40 AM
116
points
21
comments
58
min read
LW
link
(www.openphilanthropy.org)
The News is Never Neglected
lsusr
Feb 11, 2025, 2:59 PM
111
points
18
comments
1
min read
LW
link
Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas
jake_mendel
,
maxnadeau
and
Peter Favaloro
Feb 6, 2025, 6:58 PM
111
points
0
comments
1
min read
LW
link
(www.openphilanthropy.org)
Two hemispheres—I do not think it means what you think it means
Viliam
Feb 9, 2025, 3:33 PM
108
points
21
comments
14
min read
LW
link
You can just wear a suit
lsusr
Feb 26, 2025, 2:57 PM
108
points
48
comments
2
min read
LW
link
My model of what is going on with LLMs
Cole Wyeth
Feb 13, 2025, 3:43 AM
104
points
49
comments
7
min read
LW
link
Judgements: Merging Prediction & Evidence
abramdemski
Feb 23, 2025, 7:35 PM
103
points
5
comments
6
min read
LW
link
Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill
,
bilalchughtai
,
StefanHex
and
Marius Hobbhahn
Feb 6, 2025, 3:46 PM
102
points
9
comments
2
min read
LW
link
(arxiv.org)
AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah
Feb 17, 2025, 9:11 PM
102
points
19
comments
10
min read
LW
link
A short course on AGI safety from the GDM Alignment team
Vika
and
Rohin Shah
Feb 14, 2025, 3:43 PM
101
points
1
comment
1
min read
LW
link
(deepmindsafetyresearch.medium.com)
C’mon guys, Deliberate Practice is Real
Raemon
Feb 5, 2025, 10:33 PM
98
points
25
comments
9
min read
LW
link
Timaeus in 2024
Jesse Hoogland
,
Stan van Wingerden
,
Alexander Gietelink Oldenziel
and
Daniel Murfet
Feb 20, 2025, 11:54 PM
96
points
1
comment
8
min read
LW
link
Reviewing LessWrong: Screwtape’s Basic Answer
Screwtape
Feb 5, 2025, 4:30 AM
96
points
18
comments
6
min read
LW
link
Dear AGI,
Nathan Young
Feb 18, 2025, 10:48 AM
90
points
11
comments
3
min read
LW
link
Wired on: “DOGE personnel with admin access to Federal Payment System”
Raemon
Feb 5, 2025, 9:32 PM
88
points
45
comments
2
min read
LW
link
(web.archive.org)
Anthropic releases Claude 3.7 Sonnet with extended thinking mode
LawrenceC
Feb 24, 2025, 7:32 PM
88
points
8
comments
4
min read
LW
link
(www.anthropic.com)
The Risk of Gradual Disempowerment from AI
Zvi
Feb 5, 2025, 10:10 PM
86
points
15
comments
20
min read
LW
link
(thezvi.wordpress.com)
Voting Results for the 2023 Review
Raemon
Feb 6, 2025, 8:00 AM
86
points
3
comments
69
min read
LW
link
How might we safely pass the buck to AI?
joshc
Feb 19, 2025, 5:48 PM
83
points
58
comments
31
min read
LW
link
Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu
and
Louis Jaburi
Feb 13, 2025, 6:24 PM
83
points
6
comments
11
min read
LW
link
The Mask Comes Off: A Trio of Tales
Zvi
Feb 14, 2025, 3:30 PM
81
points
1
comment
13
min read
LW
link
(thezvi.wordpress.com)
Microplastics: Much Less Than You Wanted To Know
jenn
,
kaleb
and
Brent
Feb 15, 2025, 7:08 PM
80
points
8
comments
13
min read
LW
link
[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik
Feb 26, 2025, 12:50 PM
79
points
8
comments
7
min read
LW
link
OpenAI releases deep research agent
Seth Herd
Feb 3, 2025, 12:48 PM
78
points
21
comments
3
min read
LW
link
(openai.com)
Pick two: concise, comprehensive, or clear rules
Screwtape
Feb 3, 2025, 6:39 AM
78
points
27
comments
8
min read
LW
link
Evaluating “What 2026 Looks Like” So Far
Jonny Spicer
Feb 24, 2025, 6:55 PM
77
points
5
comments
7
min read
LW
link
Anti-Slop Interventions?
abramdemski
Feb 4, 2025, 7:50 PM
76
points
33
comments
6
min read
LW
link
The Simplest Good
Jesse Hoogland
Feb 2, 2025, 7:51 PM
75
points
6
comments
5
min read
LW
link
MATS Applications + Research Directions I’m Currently Excited About
Neel Nanda
Feb 6, 2025, 11:03 AM
73
points
7
comments
8
min read
LW
link
Osaka
lsusr
Feb 26, 2025, 1:50 PM
72
points
11
comments
1
min read
LW
link
A Problem to Solve Before Building a Deception Detector
Eleni Angelou
and
lewis smith
Feb 7, 2025, 7:35 PM
71
points
12
comments
14
min read
LW
link
Thermodynamic entropy = Kolmogorov complexity
Aram Ebtekar
Feb 17, 2025, 5:56 AM
70
points
12
comments
1
min read
LW
link
(doi.org)
Language Models Use Trigonometry to Do Addition
Subhash Kantamneni
Feb 5, 2025, 1:50 PM
70
points
1
comment
10
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel