Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Reward is not the optimization target
TurnTrout
25 Jul 2022 0:03 UTC
375
points
123
comments
10
min read
LW
link
3
reviews
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
18 Jul 2022 19:06 UTC
368
points
95
comments
75
min read
LW
link
1
review
What should you change in response to an “emergency”? And AI risk
AnnaSalamon
18 Jul 2022 1:11 UTC
336
points
60
comments
6
min read
LW
link
1
review
Looking back on my alignment PhD
TurnTrout
1 Jul 2022 3:19 UTC
332
points
66
comments
11
min read
LW
link
On how various plans miss the hard bits of the alignment challenge
So8res
12 Jul 2022 2:49 UTC
305
points
89
comments
29
min read
LW
link
3
reviews
Toni Kurz and the Insanity of Climbing Mountains
GeneSmith
3 Jul 2022 20:51 UTC
270
points
67
comments
11
min read
LW
link
2
reviews
Changing the world through slack & hobbies
Steven Byrnes
21 Jul 2022 18:11 UTC
261
points
13
comments
10
min read
LW
link
Safetywashing
Adam Scholl
1 Jul 2022 11:56 UTC
260
points
20
comments
1
min read
LW
link
2
reviews
Sexual Abuse attitudes might be infohazardous
Pseudonymous Otter
19 Jul 2022 18:06 UTC
256
points
72
comments
1
min read
LW
link
Humans provide an untapped wealth of evidence about alignment
TurnTrout
and
Quintin Pope
14 Jul 2022 2:31 UTC
211
points
94
comments
9
min read
LW
link
1
review
Unifying Bargaining Notions (1/2)
Diffractor
25 Jul 2022 0:28 UTC
210
points
41
comments
16
min read
LW
link
A note about differential technological development
So8res
15 Jul 2022 4:46 UTC
197
points
33
comments
6
min read
LW
link
Connor Leahy on Dying with Dignity, EleutherAI and Conjecture
Michaël Trazzi
22 Jul 2022 18:44 UTC
195
points
29
comments
14
min read
LW
link
(theinsideview.ai)
AGI ruin scenarios are likely (and disjunctive)
So8res
27 Jul 2022 3:21 UTC
175
points
38
comments
6
min read
LW
link
ITT-passing and civility are good; “charity” is bad; steelmanning is niche
Rob Bensinger
5 Jul 2022 0:15 UTC
161
points
36
comments
6
min read
LW
link
1
review
«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch
26 Jul 2022 23:03 UTC
158
points
33
comments
7
min read
LW
link
Resolve Cycles
CFAR!Duncan
16 Jul 2022 23:17 UTC
139
points
8
comments
10
min read
LW
link
Carrying the Torch: A Response to Anna Salamon by the Guild of the Rose
moridinamael
6 Jul 2022 14:20 UTC
136
points
16
comments
6
min read
LW
link
Brainstorm of things that could force an AI team to burn their lead
So8res
24 Jul 2022 23:58 UTC
134
points
8
comments
13
min read
LW
link
AI Forecasting: One Year In
jsteinhardt
4 Jul 2022 5:10 UTC
132
points
12
comments
6
min read
LW
link
(bounded-regret.ghost.io)
Conjecture: Internal Infohazard Policy
Connor Leahy
,
Sid Black
,
Chris Scammell
and
Andrea_Miotti
29 Jul 2022 19:07 UTC
131
points
6
comments
19
min read
LW
link
Limerence Messes Up Your Rationality Real Bad, Yo
Raemon
1 Jul 2022 16:53 UTC
127
points
42
comments
3
min read
LW
link
2
reviews
Principles for Alignment/Agency Projects
johnswentworth
7 Jul 2022 2:07 UTC
122
points
20
comments
4
min read
LW
link
Unifying Bargaining Notions (2/2)
Diffractor
27 Jul 2022 3:40 UTC
118
points
19
comments
21
min read
LW
link
Focusing
CFAR!Duncan
29 Jul 2022 19:15 UTC
114
points
23
comments
14
min read
LW
link
Circumventing interpretability: How to defeat mind-readers
Lee Sharkey
14 Jul 2022 16:59 UTC
114
points
15
comments
33
min read
LW
link
Moral strategies at different capability levels
Richard_Ngo
27 Jul 2022 18:50 UTC
112
points
14
comments
5
min read
LW
link
(thinkingcomplete.blogspot.com)
Criticism of EA Criticism Contest
Zvi
14 Jul 2022 14:30 UTC
108
points
17
comments
31
min read
LW
link
1
review
(thezvi.wordpress.com)
Examples of AI Increasing AI Progress
TW123
17 Jul 2022 20:06 UTC
107
points
14
comments
1
min read
LW
link
Safety Implications of LeCun’s path to machine intelligence
Ivan Vendrov
15 Jul 2022 21:47 UTC
102
points
18
comments
6
min read
LW
link
Comment on “Propositions Concerning Digital Minds and Society”
Zack_M_Davis
10 Jul 2022 5:48 UTC
99
points
12
comments
8
min read
LW
link
Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
Jeffrey Ladish
11 Jul 2022 19:38 UTC
98
points
27
comments
6
min read
LW
link
1
review
Naive Hypotheses on AI Alignment
Shoshannah Tekofsky
2 Jul 2022 19:03 UTC
98
points
29
comments
5
min read
LW
link
A summary of every “Highlights from the Sequences” post
Akash
15 Jul 2022 23:01 UTC
97
points
7
comments
17
min read
LW
link
Help ARC evaluate capabilities of current language models (still need people)
Beth Barnes
19 Jul 2022 4:55 UTC
95
points
6
comments
2
min read
LW
link
MATS Models
johnswentworth
9 Jul 2022 0:14 UTC
94
points
5
comments
16
min read
LW
link
Opening Session Tips & Advice
CFAR!Duncan
25 Jul 2022 3:57 UTC
94
points
3
comments
14
min read
LW
link
1
review
Human values & biases are inaccessible to the genome
TurnTrout
7 Jul 2022 17:29 UTC
94
points
54
comments
6
min read
LW
link
1
review
Internal Double Crux
CFAR!Duncan
22 Jul 2022 4:34 UTC
93
points
15
comments
12
min read
LW
link
Immanuel Kant and the Decision Theory App Store
Daniel Kokotajlo
10 Jul 2022 16:04 UTC
92
points
12
comments
5
min read
LW
link
Goal Factoring
CFAR!Duncan
5 Jul 2022 7:10 UTC
92
points
2
comments
8
min read
LW
link
How to Diversify Conceptual Alignment: the Model Behind Refine
adamShimi
20 Jul 2022 10:44 UTC
87
points
11
comments
8
min read
LW
link
Don’t use ‘infohazard’ for collectively destructive info
Eliezer Yudkowsky
15 Jul 2022 5:13 UTC
86
points
33
comments
1
min read
LW
link
2
reviews
(www.facebook.com)
Trigger-Action Planning
CFAR!Duncan
3 Jul 2022 1:42 UTC
86
points
14
comments
13
min read
LW
link
2
reviews
Trends in GPU price-performance
Marius Hobbhahn
and
Tamay
1 Jul 2022 15:51 UTC
85
points
13
comments
1
min read
LW
link
1
review
(epochai.org)
All AGI safety questions welcome (especially basic ones) [July 2022]
plex
and
Robert Miles
16 Jul 2022 12:57 UTC
84
points
132
comments
3
min read
LW
link
Benchmark for successful concept extrapolation/avoiding goal misgeneralization
Stuart_Armstrong
4 Jul 2022 20:48 UTC
82
points
12
comments
4
min read
LW
link
Addendum: A non-magical explanation of Jeffrey Epstein
lc
18 Jul 2022 17:40 UTC
81
points
21
comments
11
min read
LW
link
Decision theory and dynamic inconsistency
paulfchristiano
3 Jul 2022 22:20 UTC
80
points
33
comments
10
min read
LW
link
(sideways-view.com)
[Question]
How do AI timelines affect how you live your life?
Quadratic Reciprocity
11 Jul 2022 13:54 UTC
80
points
50
comments
1
min read
LW
link
Back to top
Next