Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Page
1
Scale Was All We Needed, At First
Gabe M
Feb 14, 2024, 1:49 AM
295
points
34
comments
8
min read
LW
link
(aiacumen.substack.com)
Raising children on the eve of AI
juliawise
Feb 15, 2024, 9:28 PM
275
points
47
comments
5
min read
LW
link
“No-one in my org puts money in their pension”
Tobes
Feb 16, 2024, 6:33 PM
271
points
16
comments
9
min read
LW
link
(seekingtobejolly.substack.com)
Believing In
AnnaSalamon
Feb 8, 2024, 7:06 AM
241
points
51
comments
13
min read
LW
link
CFAR Takeaways: Andrew Critch
Raemon
Feb 14, 2024, 1:37 AM
217
points
64
comments
5
min read
LW
link
Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko
Feb 3, 2024, 8:36 PM
209
points
156
comments
9
min read
LW
link
Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison
Feb 10, 2024, 7:52 PM
198
points
52
comments
LW
link
(garrisonlovely.substack.com)
Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen
Feb 22, 2024, 11:56 PM
186
points
5
comments
4
min read
LW
link
(bayesshammai.substack.com)
Every “Every Bay Area House Party” Bay Area House Party
Richard_Ngo
Feb 16, 2024, 6:53 PM
181
points
6
comments
4
min read
LW
link
Timaeus’s First Four Months
Jesse Hoogland
,
Daniel Murfet
,
Stan van Wingerden
and
Alexander Gietelink Oldenziel
Feb 28, 2024, 5:01 PM
173
points
6
comments
6
min read
LW
link
And All the Shoggoths Merely Players
Zack_M_Davis
Feb 10, 2024, 7:56 PM
170
points
57
comments
12
min read
LW
link
Masterpiece
Richard_Ngo
Feb 13, 2024, 11:10 PM
166
points
21
comments
4
min read
LW
link
(www.narrativeark.xyz)
2023 Survey Results
Screwtape
Feb 16, 2024, 10:24 PM
150
points
26
comments
44
min read
LW
link
Updatelessness doesn’t solve most problems
Martín Soto
Feb 8, 2024, 5:30 PM
135
points
45
comments
12
min read
LW
link
Things I’ve Grieved
Raemon
Feb 18, 2024, 7:32 PM
125
points
6
comments
2
min read
LW
link
The Pareto Best and the Curse of Doom
Screwtape
Feb 21, 2024, 11:10 PM
120
points
21
comments
9
min read
LW
link
Rationality Research Report: Towards 10x OODA Looping?
Raemon
Feb 24, 2024, 9:06 PM
117
points
25
comments
15
min read
LW
link
Attitudes about Applied Rationality
Camille Berger
Feb 3, 2024, 2:42 PM
108
points
18
comments
4
min read
LW
link
Skills I’d like my collaborators to have
Raemon
Feb 9, 2024, 8:20 AM
106
points
9
comments
8
min read
LW
link
A Chess-GPT Linear Emergent World Representation
Adam Karvonen
Feb 8, 2024, 4:25 AM
105
points
14
comments
7
min read
LW
link
(adamkarvonen.github.io)
New LessWrong review winner UI (“The LeastWrong” section and full-art post pages)
kave
Feb 28, 2024, 2:42 AM
105
points
64
comments
1
min read
LW
link
Dreams of AI alignment: The danger of suggestive names
TurnTrout
Feb 10, 2024, 1:22 AM
103
points
59
comments
4
min read
LW
link
Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom
Feb 2, 2024, 6:54 AM
103
points
37
comments
15
min read
LW
link
Lsusr’s Rationality Dojo
lsusr
Feb 13, 2024, 5:52 AM
103
points
17
comments
2
min read
LW
link
Counting arguments provide no evidence for AI doom
Nora Belrose
and
Quintin Pope
Feb 27, 2024, 11:03 PM
101
points
188
comments
14
min read
LW
link
My cover story in Jacobin on AI capitalism and the x-risk debates
garrison
Feb 12, 2024, 11:34 PM
98
points
5
comments
LW
link
(jacobin.com)
Announcing the London Initiative for Safe AI (LISA)
James Fox
,
mike_safeAI
and
Ryan Kidd
Feb 2, 2024, 11:17 PM
98
points
0
comments
9
min read
LW
link
Things You’re Allowed to Do: University Edition
Saul Munn
Feb 6, 2024, 12:36 AM
97
points
13
comments
5
min read
LW
link
(www.brasstacks.blog)
OpenAI’s Sora is an agent
Caleb Biddulph
Feb 16, 2024, 7:35 AM
97
points
25
comments
4
min read
LW
link
Ideological Bayesians
Kevin Dorst
Feb 25, 2024, 2:17 PM
96
points
4
comments
10
min read
LW
link
(kevindorst.substack.com)
Everything Wrong with Roko’s Claims about an Engineered Pandemic
WitheringWeights
Feb 22, 2024, 3:59 PM
94
points
10
comments
16
min read
LW
link
How well do truth probes generalise?
mishajw
Feb 24, 2024, 2:12 PM
93
points
11
comments
9
min read
LW
link
How to train your own “Sleeper Agents”
evhub
Feb 7, 2024, 12:31 AM
92
points
11
comments
2
min read
LW
link
story-based decision-making
bhauth
Feb 7, 2024, 2:35 AM
90
points
11
comments
4
min read
LW
link
Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan
,
John Hughes
,
Dan Valentine
,
Sam Bowman
and
Ethan Perez
Feb 7, 2024, 9:28 PM
89
points
14
comments
9
min read
LW
link
(arxiv.org)
More Hyphenation
Arjun Panickssery
Feb 7, 2024, 7:43 PM
88
points
19
comments
1
min read
LW
link
(arjunpanickssery.substack.com)
Addressing Feature Suppression in SAEs
Benjamin Wright
and
Lee Sharkey
Feb 16, 2024, 6:32 PM
86
points
4
comments
10
min read
LW
link
AI #51: Altman’s Ambition
Zvi
Feb 20, 2024, 7:50 PM
83
points
5
comments
38
min read
LW
link
(thezvi.wordpress.com)
Retirement Accounts and Short Timelines
jefftk
Feb 19, 2024, 6:50 PM
83
points
35
comments
2
min read
LW
link
(www.jefftk.com)
The Gemini Incident
Zvi
Feb 22, 2024, 9:00 PM
80
points
19
comments
18
min read
LW
link
(thezvi.wordpress.com)
Wrong answer bias
lemonhope
Feb 1, 2024, 8:05 PM
78
points
23
comments
1
min read
LW
link
Attention SAEs Scale to GPT-2 Small
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Feb 3, 2024, 6:50 AM
78
points
4
comments
8
min read
LW
link
Analogies between scaling labs and misaligned superintelligent AI
scasper
21 Feb 2024 19:29 UTC
77
points
5
comments
4
min read
LW
link
My guess at Conjecture’s vision: triggering a narrative bifurcation
Alexandre Variengien
6 Feb 2024 19:10 UTC
75
points
12
comments
16
min read
LW
link
Implementing activation steering
Annah
5 Feb 2024 17:51 UTC
75
points
8
comments
7
min read
LW
link
Do sparse autoencoders find “true features”?
Demian Till
22 Feb 2024 18:06 UTC
74
points
33
comments
11
min read
LW
link
The One and a Half Gemini
Zvi
22 Feb 2024 13:10 UTC
73
points
4
comments
8
min read
LW
link
(thezvi.wordpress.com)
Preventing model exfiltration with upload limits
ryan_greenblatt
6 Feb 2024 16:29 UTC
71
points
22
comments
14
min read
LW
link
Survey for alignment researchers!
Cameron Berg
,
Judd Rosenblatt
and
AE Studio
2 Feb 2024 20:41 UTC
71
points
11
comments
1
min read
LW
link
Davidad’s Provably Safe AI Architecture—ARIA’s Programme Thesis
simeon_c
1 Feb 2024 21:30 UTC
69
points
17
comments
1
min read
LW
link
(www.aria.org.uk)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel