Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Aversion Factoring
CFAR!Duncan
Jul 7, 2022, 4:09 PM
79
points
1
comment
8
min read
LW
link
Abstracting The Hardness of Alignment: Unbounded Atomic Optimization
adamShimi
Jul 29, 2022, 6:59 PM
75
points
3
comments
16
min read
LW
link
Which values are stable under ontology shifts?
Richard_Ngo
Jul 23, 2022, 2:40 AM
75
points
48
comments
3
min read
LW
link
(thinkingcomplete.blogspot.com)
A Pattern Language For Rationality
Vaniver
Jul 5, 2022, 7:08 PM
75
points
14
comments
15
min read
LW
link
Principles of Privacy for Alignment Research
johnswentworth
Jul 27, 2022, 7:53 PM
73
points
31
comments
7
min read
LW
link
NeurIPS ML Safety Workshop 2022
Dan H
Jul 26, 2022, 3:28 PM
72
points
2
comments
1
min read
LW
link
(neurips2022.mlsafety.org)
A time-invariant version of Laplace’s rule
Jsevillamol
and
Ege Erdil
Jul 15, 2022, 7:28 PM
72
points
13
comments
17
min read
LW
link
(epochai.org)
Cognitive Risks of Adolescent Binge Drinking
Elizabeth
and
Martin Bernstorff
Jul 20, 2022, 9:10 PM
70
points
12
comments
10
min read
LW
link
(acesounderglass.com)
Avoid the abbreviation “FLOPs” – use “FLOP” or “FLOP/s” instead
Daniel_Eth
Jul 10, 2022, 10:44 AM
70
points
13
comments
1
min read
LW
link
Taste & Shaping
CFAR!Duncan
Jul 10, 2022, 5:50 AM
67
points
1
comment
16
min read
LW
link
My vision of a good future, part I
Jeffrey Ladish
Jul 6, 2022, 1:23 AM
66
points
18
comments
9
min read
LW
link
Curating “The Epistemic Sequences” (list v.0.1)
Andrew_Critch
Jul 23, 2022, 10:17 PM
65
points
12
comments
7
min read
LW
link
Applications are open for CFAR workshops in Prague this fall!
John Steidley
Jul 19, 2022, 6:29 PM
64
points
3
comments
2
min read
LW
link
What’s next for instrumental rationality?
Andrew_Critch
Jul 23, 2022, 10:55 PM
63
points
7
comments
1
min read
LW
link
Introducing the Fund for Alignment Research (We’re Hiring!)
AdamGleave
,
Scott Emmons
,
Ethan Perez
and
Claudia Shi
Jul 6, 2022, 2:07 AM
62
points
0
comments
4
min read
LW
link
Response to Blake Richards: AGI, generality, alignment, & loss functions
Steven Byrnes
Jul 12, 2022, 1:56 PM
62
points
9
comments
15
min read
LW
link
Double Crux
CFAR!Duncan
Jul 24, 2022, 6:34 AM
61
points
9
comments
11
min read
LW
link
My Most Likely Reason to Die Young is AI X-Risk
AISafetyIsNotLongtermist
Jul 4, 2022, 5:08 PM
61
points
24
comments
4
min read
LW
link
(forum.effectivealtruism.org)
Conditioning Generative Models for Alignment
Jozdien
Jul 18, 2022, 7:11 AM
60
points
8
comments
20
min read
LW
link
When Giving People Money Doesn’t Help
Zvi
Jul 7, 2022, 1:00 PM
58
points
12
comments
10
min read
LW
link
(thezvi.wordpress.com)
A Bias Against Altruism
Lone Pine
Jul 23, 2022, 8:44 PM
58
points
30
comments
2
min read
LW
link
The Reader’s Guide to Optimal Monetary Policy
Ege Erdil
Jul 25, 2022, 3:10 PM
57
points
10
comments
14
min read
LW
link
Deep learning curriculum for large language model alignment
Jacob_Hilton
Jul 13, 2022, 9:58 PM
57
points
3
comments
1
min read
LW
link
(github.com)
Deception?! I ain’t got time for that!
Paul Colognese
Jul 18, 2022, 12:06 AM
55
points
5
comments
13
min read
LW
link
[AN #172] Sorry for the long hiatus!
Rohin Shah
Jul 5, 2022, 6:20 AM
54
points
0
comments
3
min read
LW
link
(mailchi.mp)
Don’t take the organizational chart literally
lc
Jul 21, 2022, 12:56 AM
54
points
21
comments
4
min read
LW
link
Procedural Executive Function, Part 1
DaystarEld
Jul 4, 2022, 6:51 PM
52
points
8
comments
14
min read
LW
link
(daystareld.com)
Comfort Zone Exploration
CFAR!Duncan
Jul 15, 2022, 9:18 PM
51
points
2
comments
12
min read
LW
link
Outer vs inner misalignment: three framings
Richard_Ngo
Jul 6, 2022, 7:46 PM
51
points
5
comments
9
min read
LW
link
Race Along Rashomon Ridge
Stephen Fowler
,
Peter S. Park
and
MichaelEinhorn
Jul 7, 2022, 3:20 AM
50
points
15
comments
8
min read
LW
link
Making decisions using multiple worldviews
Richard_Ngo
Jul 13, 2022, 7:15 PM
50
points
10
comments
11
min read
LW
link
Acceptability Verification: A Research Agenda
David Udell
and
evhub
Jul 12, 2022, 8:11 PM
50
points
0
comments
1
min read
LW
link
(docs.google.com)
Report from a civilizational observer on Earth
owencb
Jul 9, 2022, 5:26 PM
49
points
12
comments
6
min read
LW
link
Announcing the AI Safety Field Building Hub, a new effort to provide AISFB projects, mentorship, and funding
Vael Gates
Jul 28, 2022, 9:29 PM
49
points
3
comments
6
min read
LW
link
Potato diet: A post mortem and an answer to SMTM’s article
Épiphanie Gédéon
Jul 14, 2022, 11:18 PM
48
points
34
comments
16
min read
LW
link
The Alignment Problem
lsusr
Jul 11, 2022, 3:03 AM
46
points
18
comments
3
min read
LW
link
Babysitting as Parenting Trial?
jefftk
Jul 7, 2022, 1:20 PM
46
points
19
comments
3
min read
LW
link
(www.jefftk.com)
The Most Important Century: The Animation
Writer
and
Matthew Barnett
Jul 24, 2022, 8:58 PM
46
points
2
comments
20
min read
LW
link
(youtu.be)
Deontological Evil
lsusr
Jul 2, 2022, 6:57 AM
45
points
4
comments
2
min read
LW
link
Eavesdropping on Aliens: A Data Decoding Challenge
anonymousaisafety
Jul 24, 2022, 4:35 AM
44
points
9
comments
4
min read
LW
link
Tarnished Guy who Puts a Num on it
Jacob Falkovich
Jul 6, 2022, 6:05 PM
44
points
11
comments
4
min read
LW
link
Goal Alignment Is Robust To the Sharp Left Turn
Thane Ruthenis
Jul 13, 2022, 8:23 PM
43
points
16
comments
4
min read
LW
link
Bucket Errors
CFAR!Duncan
Jul 29, 2022, 6:50 PM
43
points
7
comments
11
min read
LW
link
Systemization
CFAR!Duncan
Jul 11, 2022, 6:39 PM
42
points
5
comments
12
min read
LW
link
Safety considerations for online generative modeling
Sam Marks
Jul 7, 2022, 6:31 PM
42
points
9
comments
14
min read
LW
link
Artificial Sandwiching: When can we test scalable alignment protocols without humans?
Sam Bowman
Jul 13, 2022, 9:14 PM
42
points
6
comments
5
min read
LW
link
Meiosis is all you need
Metacelsus
Jul 1, 2022, 7:39 AM
41
points
3
comments
2
min read
LW
link
(denovo.substack.com)
The curious case of Pretty Good human inner/outer alignment
PavleMiha
Jul 5, 2022, 7:04 PM
41
points
45
comments
4
min read
LW
link
QNR Prospects
PeterMcCluskey
Jul 16, 2022, 2:03 AM
40
points
3
comments
8
min read
LW
link
(www.bayesianinvestor.com)
[Linkpost] Existential Risk Analysis in Empirical Research Papers
Dan H
Jul 2, 2022, 12:09 AM
40
points
0
comments
1
min read
LW
link
(arxiv.org)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel