Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
The Waluigi Effect (mega-post)
Cleo Nardo
Mar 3, 2023, 3:22 AM
628
points
188
comments
16
min read
LW
link
My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”
Quintin Pope
Mar 21, 2023, 12:06 AM
359
points
233
comments
39
min read
LW
link
1
review
Shutting Down the Lightcone Offices
habryka
and
Ben Pace
Mar 14, 2023, 10:47 PM
338
points
103
comments
17
min read
LW
link
2
reviews
Understanding and controlling a maze-solving policy network
TurnTrout
,
peligrietzer
,
Ulisse Mini
,
Monte M
and
David Udell
Mar 11, 2023, 6:59 PM
333
points
28
comments
23
min read
LW
link
The Parable of the King and the Random Process
moridinamael
Mar 1, 2023, 10:18 PM
312
points
26
comments
6
min read
LW
link
3
reviews
Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky
jacquesthibs
Mar 29, 2023, 11:16 PM
291
points
297
comments
3
min read
LW
link
(time.com)
Discussion with Nate Soares on a key alignment difficulty
HoldenKarnofsky
Mar 13, 2023, 9:20 PM
265
points
43
comments
22
min read
LW
link
1
review
“Carefully Bootstrapped Alignment” is organizationally hard
Raemon
Mar 17, 2023, 6:00 PM
262
points
23
comments
11
min read
LW
link
1
review
Deep Deceptiveness
So8res
Mar 21, 2023, 2:51 AM
251
points
60
comments
14
min read
LW
link
1
review
Natural Abstractions: Key claims, Theorems, and Critiques
LawrenceC
,
Leon Lang
and
Erik Jenner
Mar 16, 2023, 4:37 PM
241
points
26
comments
45
min read
LW
link
3
reviews
More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Beth Barnes
Mar 19, 2023, 12:25 AM
233
points
54
comments
8
min read
LW
link
(evals.alignment.org)
An AI risk argument that resonates with NYTimes readers
Julian Bradshaw
Mar 12, 2023, 11:09 PM
212
points
14
comments
1
min read
LW
link
Actually, Othello-GPT Has A Linear Emergent World Representation
Neel Nanda
Mar 29, 2023, 10:13 PM
211
points
26
comments
19
min read
LW
link
(neelnanda.io)
The salt in pasta water fallacy
Thomas Sepulchre
Mar 27, 2023, 2:53 PM
204
points
46
comments
3
min read
LW
link
2
reviews
GPT-4 Plugs In
Zvi
Mar 27, 2023, 12:10 PM
198
points
47
comments
6
min read
LW
link
(thezvi.wordpress.com)
Acausal normalcy
Andrew_Critch
Mar 3, 2023, 11:34 PM
195
points
36
comments
8
min read
LW
link
1
review
Why Not Just… Build Weak AI Tools For AI Alignment Research?
johnswentworth
Mar 5, 2023, 12:12 AM
183
points
18
comments
6
min read
LW
link
ChatGPT (and now GPT4) is very easily distracted from its rules
dmcs
Mar 15, 2023, 5:55 PM
180
points
42
comments
1
min read
LW
link
A rough and incomplete review of some of John Wentworth’s research
So8res
Mar 28, 2023, 6:52 PM
175
points
18
comments
18
min read
LW
link
Anthropic’s Core Views on AI Safety
Zac Hatfield-Dodds
Mar 9, 2023, 4:55 PM
172
points
39
comments
2
min read
LW
link
(www.anthropic.com)
A stylized dialogue on John Wentworth’s claims about markets and optimization
So8res
Mar 25, 2023, 10:32 PM
169
points
22
comments
8
min read
LW
link
What Discovering Latent Knowledge Did and Did Not Find
Fabien Roger
Mar 13, 2023, 7:29 PM
166
points
17
comments
11
min read
LW
link
Towards understanding-based safety evaluations
evhub
Mar 15, 2023, 6:18 PM
164
points
16
comments
5
min read
LW
link
What would a compute monitoring plan look like? [Linkpost]
Orpheus16
Mar 26, 2023, 7:33 PM
158
points
10
comments
4
min read
LW
link
(arxiv.org)
Inside the mind of a superhuman Go model: How does Leela Zero read ladders?
Haoxing Du
Mar 1, 2023, 1:47 AM
157
points
8
comments
30
min read
LW
link
AI: Practical Advice for the Worried
Zvi
Mar 1, 2023, 12:30 PM
155
points
49
comments
16
min read
LW
link
2
reviews
(thezvi.wordpress.com)
POC || GTFO culture as partial antidote to alignment wordcelism
lc
Mar 15, 2023, 10:21 AM
155
points
13
comments
7
min read
LW
link
2
reviews
Why Not Just Outsource Alignment Research To An AI?
johnswentworth
Mar 9, 2023, 9:49 PM
151
points
50
comments
9
min read
LW
link
1
review
GPT-4
nz
Mar 14, 2023, 5:02 PM
151
points
150
comments
1
min read
LW
link
(openai.com)
Why I’m not into the Free Energy Principle
Steven Byrnes
Mar 2, 2023, 7:27 PM
149
points
50
comments
9
min read
LW
link
1
review
Comments on OpenAI’s “Planning for AGI and beyond”
So8res
Mar 3, 2023, 11:01 PM
148
points
2
comments
14
min read
LW
link
Dan Luu on “You can only communicate one top priority”
Raemon
Mar 18, 2023, 6:55 PM
148
points
18
comments
3
min read
LW
link
(twitter.com)
Remarks 1–18 on GPT (compressed)
Cleo Nardo
Mar 20, 2023, 10:27 PM
145
points
35
comments
31
min read
LW
link
The Translucent Thoughts Hypotheses and Their Implications
Fabien Roger
Mar 9, 2023, 4:30 PM
142
points
7
comments
19
min read
LW
link
Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent
ArthurB
Mar 9, 2023, 9:26 AM
140
points
33
comments
2
min read
LW
link
Against LLM Reductionism
Erich_Grunewald
Mar 8, 2023, 3:52 PM
140
points
17
comments
18
min read
LW
link
(www.erichgrunewald.com)
Conceding a short timelines bet early
Matthew Barnett
Mar 16, 2023, 9:49 PM
133
points
17
comments
1
min read
LW
link
Good News, Everyone!
jbash
Mar 25, 2023, 1:48 PM
132
points
23
comments
2
min read
LW
link
We have to Upgrade
Jed McCaleb
Mar 23, 2023, 5:53 PM
129
points
35
comments
2
min read
LW
link
[Linkpost] Some high-level thoughts on the DeepMind alignment team’s strategy
Vika
and
Rohin Shah
Mar 7, 2023, 11:55 AM
128
points
13
comments
5
min read
LW
link
(drive.google.com)
High Status Eschews Quantification of Performance
niplav
19 Mar 2023 22:14 UTC
128
points
36
comments
5
min read
LW
link
FLI open letter: Pause giant AI experiments
Zach Stein-Perlman
29 Mar 2023 4:04 UTC
126
points
123
comments
2
min read
LW
link
(futureoflife.org)
How bad a future do ML researchers expect?
KatjaGrace
9 Mar 2023 4:50 UTC
122
points
8
comments
2
min read
LW
link
(aiimpacts.org)
Manifold: If okay AGI, why?
Eliezer Yudkowsky
25 Mar 2023 22:43 UTC
120
points
37
comments
1
min read
LW
link
(manifold.markets)
ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
Christopher King
15 Mar 2023 0:29 UTC
116
points
22
comments
2
min read
LW
link
Parasitic Language Games: maintaining ambiguity to hide conflict while burning the commons
Hazard
12 Mar 2023 5:25 UTC
115
points
17
comments
13
min read
LW
link
“Publish or Perish” (a quick note on why you should try to make your work legible to existing academic communities)
David Scott Krueger (formerly: capybaralet)
18 Mar 2023 19:01 UTC
112
points
49
comments
1
min read
LW
link
1
review
GPT can write Quines now (GPT-4)
Andrew_Critch
14 Mar 2023 19:18 UTC
112
points
30
comments
1
min read
LW
link
Here, have a calmness video
Kaj_Sotala
16 Mar 2023 10:00 UTC
111
points
15
comments
2
min read
LW
link
(www.youtube.com)
“Liquidity” vs “solvency” in bank runs (and some notes on Silicon Valley Bank)
rossry
12 Mar 2023 9:16 UTC
108
points
27
comments
12
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel