Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Page
3
Launching Lightspeed Grants (Apply by July 6th)
habryka
Jun 7, 2023, 2:53 AM
211
points
41
comments
5
min read
LW
link
Actually, Othello-GPT Has A Linear Emergent World Representation
Neel Nanda
Mar 29, 2023, 10:13 PM
211
points
26
comments
19
min read
LW
link
(neelnanda.io)
Thoughts on sharing information about language model capabilities
paulfchristiano
Jul 31, 2023, 4:04 PM
210
points
44
comments
11
min read
LW
link
1
review
Labs should be explicit about why they are building AGI
peterbarnett
Oct 17, 2023, 9:09 PM
210
points
18
comments
1
min read
LW
link
1
review
The Lighthaven Campus is open for bookings
habryka
Sep 30, 2023, 1:08 AM
209
points
18
comments
5
min read
LW
link
(www.lighthaven.space)
Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds
1a3orn
Apr 4, 2023, 5:39 PM
208
points
38
comments
5
min read
LW
link
1
review
Evolution provides no evidence for the sharp left turn
Quintin Pope
Apr 11, 2023, 6:43 PM
206
points
65
comments
15
min read
LW
link
1
review
My current LK99 questions
Eliezer Yudkowsky
Aug 1, 2023, 10:48 PM
206
points
38
comments
5
min read
LW
link
Feedbackloop-first Rationality
Raemon
Aug 7, 2023, 5:58 PM
205
points
69
comments
8
min read
LW
link
2
reviews
Lightcone Infrastructure/LessWrong is looking for funding
habryka
Jun 14, 2023, 4:45 AM
205
points
39
comments
1
min read
LW
link
If interpretability research goes well, it may get dangerous
So8res
Apr 3, 2023, 9:48 PM
202
points
11
comments
2
min read
LW
link
We’re Not Ready: thoughts on “pausing” and responsible scaling policies
HoldenKarnofsky
Oct 27, 2023, 3:19 PM
200
points
33
comments
8
min read
LW
link
My tentative best guess on how EAs and Rationalists sometimes turn crazy
habryka
Jun 21, 2023, 4:11 AM
199
points
110
comments
8
min read
LW
link
GPT-4 Plugs In
Zvi
Mar 27, 2023, 12:10 PM
198
points
47
comments
6
min read
LW
link
(thezvi.wordpress.com)
Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res
Nov 24, 2023, 5:37 PM
197
points
84
comments
5
min read
LW
link
1
review
Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes
Dec 1, 2023, 5:30 PM
197
points
63
comments
14
min read
LW
link
1
review
My “2.9 trauma limit”
Raemon
Jul 1, 2023, 7:32 PM
196
points
31
comments
7
min read
LW
link
Comp Sci in 2027 (Short story by Eliezer Yudkowsky)
sudo
Oct 29, 2023, 11:09 PM
196
points
24
comments
10
min read
LW
link
1
review
(nitter.net)
Thinking By The Clock
Screwtape
Nov 8, 2023, 7:40 AM
196
points
29
comments
8
min read
LW
link
1
review
Acausal normalcy
Andrew_Critch
Mar 3, 2023, 11:34 PM
195
points
36
comments
8
min read
LW
link
1
review
Killing Socrates
Duncan Sabien (Deactivated)
Apr 11, 2023, 10:28 AM
195
points
146
comments
8
min read
LW
link
1
review
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
likenneth
Jun 11, 2023, 5:38 AM
195
points
4
comments
1
min read
LW
link
(arxiv.org)
Cognitive Emulation: A Naive AI Safety Proposal
Connor Leahy
and
Gabriel Alfour
Feb 25, 2023, 7:35 PM
195
points
46
comments
4
min read
LW
link
Is being sexy for your homies?
Valentine
Dec 13, 2023, 8:37 PM
193
points
100
comments
14
min read
LW
link
2
reviews
Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
1a3orn
Nov 2, 2023, 6:20 PM
193
points
79
comments
23
min read
LW
link
AI as a science, and three obstacles to alignment strategies
So8res
Oct 25, 2023, 9:00 PM
193
points
80
comments
11
min read
LW
link
AI alignment researchers don’t (seem to) stack
So8res
Feb 21, 2023, 12:48 AM
193
points
40
comments
3
min read
LW
link
The ‘ petertodd’ phenomenon
mwatkins
Apr 15, 2023, 12:59 AM
192
points
50
comments
38
min read
LW
link
1
review
Towards Developmental Interpretability
Jesse Hoogland
,
Alexander Gietelink Oldenziel
,
Daniel Murfet
and
Stan van Wingerden
Jul 12, 2023, 7:33 PM
192
points
10
comments
9
min read
LW
link
1
review
Sam Altman fired from OpenAI
LawrenceC
Nov 17, 2023, 8:42 PM
192
points
75
comments
1
min read
LW
link
(openai.com)
“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
Thane Ruthenis
Dec 16, 2023, 8:08 PM
191
points
34
comments
5
min read
LW
link
Grant applications and grand narratives
Elizabeth
Jul 2, 2023, 12:16 AM
191
points
22
comments
6
min read
LW
link
Twiblings, four-parent babies and other reproductive technology
GeneSmith
May 20, 2023, 5:11 PM
191
points
33
comments
6
min read
LW
link
Cryonics and Regret
MvB
Jul 24, 2023, 9:16 AM
190
points
35
comments
2
min read
LW
link
1
review
Evaluating the historical value misspecification argument
Matthew Barnett
Oct 5, 2023, 6:34 PM
190
points
162
comments
7
min read
LW
link
3
reviews
Transcript and Brief Response to Twitter Conversation between Yann LeCunn and Eliezer Yudkowsky
Zvi
Apr 26, 2023, 1:10 PM
190
points
51
comments
10
min read
LW
link
(thezvi.wordpress.com)
The King and the Golem
Richard_Ngo
Sep 25, 2023, 7:51 PM
190
points
19
comments
5
min read
LW
link
1
review
(narrativeark.substack.com)
The basic reasons I expect AGI ruin
Rob Bensinger
Apr 18, 2023, 3:37 AM
189
points
73
comments
14
min read
LW
link
The other side of the tidal wave
KatjaGrace
Nov 3, 2023, 5:40 AM
189
points
86
comments
1
min read
LW
link
(worldspiritsockpuppet.com)
A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX
Bird Concept
Sep 1, 2023, 4:03 AM
188
points
26
comments
24
min read
LW
link
1
review
What a compute-centric framework says about AI takeoff speeds
Tom Davidson
Jan 23, 2023, 4:02 AM
188
points
30
comments
16
min read
LW
link
1
review
Effective Aspersions: How the Nonlinear Investigation Went Wrong
TracingWoodgrains
Dec 19, 2023, 12:00 PM
188
points
172
comments
LW
link
2
reviews
Announcing Timaeus
Jesse Hoogland
,
Daniel Murfet
,
Alexander Gietelink Oldenziel
and
Stan van Wingerden
Oct 22, 2023, 11:59 AM
188
points
15
comments
4
min read
LW
link
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB
,
Owain_Evans
and
SoerenMind
Sep 28, 2023, 6:53 PM
187
points
39
comments
3
min read
LW
link
1
review
EigenKarma: trust at scale
Henrik Karlsson
Feb 8, 2023, 6:52 PM
186
points
52
comments
5
min read
LW
link
Another medical miracle
Dentin
Jun 25, 2023, 8:43 PM
186
points
48
comments
3
min read
LW
link
What will GPT-2030 look like?
jsteinhardt
Jun 7, 2023, 11:40 PM
185
points
43
comments
23
min read
LW
link
(bounded-regret.ghost.io)
Large Language Models will be Great for Censorship
Ethan Edwards
Aug 21, 2023, 7:03 PM
185
points
14
comments
8
min read
LW
link
(ethanedwards.substack.com)
Why Not Just… Build Weak AI Tools For AI Alignment Research?
johnswentworth
Mar 5, 2023, 12:12 AM
184
points
18
comments
6
min read
LW
link
OpenAI API base models are not sycophantic, at any size
nostalgebraist
Aug 29, 2023, 12:58 AM
183
points
20
comments
2
min read
LW
link
(colab.research.google.com)
Back to first
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel