Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Page
1
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
5 Jun 2022 22:05 UTC
908
points
701
comments
30
min read
LW
link
3
reviews
Where I agree and disagree with Eliezer
paulfchristiano
19 Jun 2022 19:15 UTC
888
points
220
comments
18
min read
LW
link
2
reviews
What an actually pessimistic containment strategy looks like
lc
5 Apr 2022 0:19 UTC
675
points
138
comments
6
min read
LW
link
2
reviews
Simulators
janus
2 Sep 2022 12:45 UTC
609
points
162
comments
41
min read
LW
link
8
reviews
(generative.ink)
Let’s think about slowing down AI
KatjaGrace
22 Dec 2022 17:40 UTC
549
points
182
comments
38
min read
LW
link
3
reviews
(aiimpacts.org)
The Redaction Machine
Ben
20 Sep 2022 22:03 UTC
500
points
48
comments
27
min read
LW
link
1
review
Luck based medicine: my resentful story of becoming a medical miracle
Elizabeth
16 Oct 2022 17:40 UTC
483
points
121
comments
12
min read
LW
link
3
reviews
(acesounderglass.com)
Losing the root for the tree
Adam Zerner
20 Sep 2022 4:53 UTC
474
points
31
comments
9
min read
LW
link
1
review
Counter-theses on Sleep
Natália
21 Mar 2022 23:21 UTC
444
points
131
comments
15
min read
LW
link
1
review
It’s Probably Not Lithium
Natália
28 Jun 2022 21:24 UTC
442
points
187
comments
28
min read
LW
link
1
review
chinchilla’s wild implications
nostalgebraist
31 Jul 2022 1:18 UTC
420
points
128
comments
10
min read
LW
link
1
review
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
and
elifland
29 Aug 2022 1:23 UTC
413
points
90
comments
37
min read
LW
link
1
review
It Looks Like You’re Trying To Take Over The World
gwern
9 Mar 2022 16:35 UTC
406
points
120
comments
1
min read
LW
link
1
review
(www.gwern.net)
DeepMind alignment team opinions on AGI ruin arguments
Vika
12 Aug 2022 21:06 UTC
395
points
37
comments
14
min read
LW
link
1
review
Reflections on six months of fatherhood
jasoncrawford
31 Jan 2022 5:28 UTC
387
points
24
comments
4
min read
LW
link
1
review
(jasoncrawford.org)
Reward is not the optimization target
TurnTrout
25 Jul 2022 0:03 UTC
376
points
123
comments
10
min read
LW
link
3
reviews
Lies Told To Children
Eliezer Yudkowsky
14 Apr 2022 11:25 UTC
375
points
94
comments
7
min read
LW
link
1
review
You Are Not Measuring What You Think You Are Measuring
johnswentworth
20 Sep 2022 20:04 UTC
374
points
44
comments
8
min read
LW
link
2
reviews
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
and
Tom Lieberum
15 Aug 2022 2:41 UTC
373
points
47
comments
36
min read
LW
link
1
review
(colab.research.google.com)
Counterarguments to the basic AI x-risk case
KatjaGrace
14 Oct 2022 13:00 UTC
370
points
124
comments
34
min read
LW
link
1
review
(aiimpacts.org)
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
18 Jul 2022 19:06 UTC
365
points
94
comments
75
min read
LW
link
1
review
Accounting For College Costs
johnswentworth
1 Apr 2022 17:28 UTC
363
points
41
comments
7
min read
LW
link
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
elspood
21 Jun 2022 23:55 UTC
361
points
42
comments
7
min read
LW
link
1
review
What DALL-E 2 can and cannot do
Swimmer963 (Miranda Dixon-Luinenburg)
1 May 2022 23:51 UTC
353
points
303
comments
9
min read
LW
link
Staring into the abyss as a core life skill
benkuhn
22 Dec 2022 15:30 UTC
341
points
21
comments
12
min read
LW
link
1
review
(www.benkuhn.net)
MIRI announces new “Death With Dignity” strategy
Eliezer Yudkowsky
2 Apr 2022 0:43 UTC
339
points
545
comments
18
min read
LW
link
1
review
What should you change in response to an “emergency”? And AI risk
AnnaSalamon
18 Jul 2022 1:11 UTC
336
points
60
comments
6
min read
LW
link
1
review
Why I think strong general AI is coming soon
porby
28 Sep 2022 5:40 UTC
335
points
141
comments
34
min read
LW
link
1
review
Looking back on my alignment PhD
TurnTrout
1 Jul 2022 3:19 UTC
331
points
64
comments
11
min read
LW
link
Beware boasting about non-existent forecasting track records
Jotto999
20 May 2022 19:20 UTC
331
points
112
comments
5
min read
LW
link
1
review
Optimality is the tiger, and agents are its teeth
Veedrac
2 Apr 2022 0:46 UTC
318
points
42
comments
16
min read
LW
link
1
review
Models Don’t “Get Reward”
Sam Ringer
30 Dec 2022 10:37 UTC
312
points
61
comments
5
min read
LW
link
1
review
Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky
30 May 2022 17:00 UTC
309
points
66
comments
13
min read
LW
link
1
review
Epistemic Legibility
Elizabeth
9 Feb 2022 18:10 UTC
306
points
30
comments
20
min read
LW
link
1
review
(acesounderglass.com)
On how various plans miss the hard bits of the alignment challenge
So8res
12 Jul 2022 2:49 UTC
305
points
88
comments
29
min read
LW
link
3
reviews
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
25 Mar 2022 23:17 UTC
301
points
56
comments
8
min read
LW
link
1
review
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
and
Eliezer Yudkowsky
1 Dec 2022 23:11 UTC
301
points
33
comments
2
min read
LW
link
Two-year update on my personal AI timelines
Ajeya Cotra
2 Aug 2022 23:07 UTC
293
points
60
comments
16
min read
LW
link
Mysteries of mode collapse
janus
8 Nov 2022 10:37 UTC
283
points
57
comments
14
min read
LW
link
1
review
A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
15 Jun 2022 13:10 UTC
282
points
54
comments
10
min read
LW
link
1
review
We Choose To Align AI
johnswentworth
1 Jan 2022 20:06 UTC
280
points
16
comments
3
min read
LW
link
1
review
Don’t die with dignity; instead play to your outs
Jeffrey Ladish
6 Apr 2022 7:53 UTC
279
points
60
comments
5
min read
LW
link
What Are You Tracking In Your Head?
johnswentworth
28 Jun 2022 19:30 UTC
279
points
83
comments
4
min read
LW
link
1
review
Is AI Progress Impossible To Predict?
alyssavance
15 May 2022 18:30 UTC
277
points
39
comments
2
min read
LW
link
Sazen
Duncan Sabien (Deactivated)
21 Dec 2022 7:54 UTC
276
points
83
comments
12
min read
LW
link
2
reviews
Toni Kurz and the Insanity of Climbing Mountains
GeneSmith
3 Jul 2022 20:51 UTC
270
points
67
comments
11
min read
LW
link
2
reviews
Humans are very reliable agents
alyssavance
16 Jun 2022 22:02 UTC
266
points
35
comments
3
min read
LW
link
12 interesting things I learned studying the discovery of nature’s laws
Ben Pace
19 Feb 2022 23:39 UTC
265
points
40
comments
9
min read
LW
link
1
review
Changing the world through slack & hobbies
Steven Byrnes
21 Jul 2022 18:11 UTC
260
points
13
comments
10
min read
LW
link
Comment reply: my low-quality thoughts on why CFAR didn’t get farther with a “real/efficacious art of rationality”
AnnaSalamon
9 Jun 2022 2:12 UTC
260
points
63
comments
17
min read
LW
link
1
review
Back to top
Next