Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Conditioning Generative Models for Alignment
Jozdien
Jul 18, 2022, 7:11 AM
60
points
8
comments
20
min read
LW
link
Training goals for large language models
Johannes Treutlein
Jul 18, 2022, 7:09 AM
28
points
5
comments
19
min read
LW
link
A distillation of Evan Hubinger’s training stories (for SERI MATS)
Daphne_W
Jul 18, 2022, 3:38 AM
15
points
1
comment
10
min read
LW
link
Forecasting ML Benchmarks in 2023
jsteinhardt
Jul 18, 2022, 2:50 AM
36
points
20
comments
12
min read
LW
link
(bounded-regret.ghost.io)
What should you change in response to an “emergency”? And AI risk
AnnaSalamon
Jul 18, 2022, 1:11 AM
338
points
60
comments
6
min read
LW
link
1
review
Deception?! I ain’t got time for that!
Paul Colognese
Jul 18, 2022, 12:06 AM
55
points
5
comments
13
min read
LW
link
How Interpretability can be Impactful
Connall Garrod
Jul 18, 2022, 12:06 AM
18
points
0
comments
37
min read
LW
link
Why you might expect homogeneous take-off: evidence from ML research
Andrei Alexandru
Jul 17, 2022, 8:31 PM
24
points
0
comments
10
min read
LW
link
Examples of AI Increasing AI Progress
TW123
Jul 17, 2022, 8:06 PM
107
points
14
comments
1
min read
LW
link
Four questions I ask AI safety researchers
Orpheus16
Jul 17, 2022, 5:25 PM
17
points
0
comments
1
min read
LW
link
Why I Think Abrupt AI Takeoff
lincolnquirk
Jul 17, 2022, 5:04 PM
14
points
6
comments
1
min read
LW
link
Culture wars in riddle format
Malmesbury
Jul 17, 2022, 2:51 PM
7
points
28
comments
3
min read
LW
link
Bangalore LW/ACX Meetup in person
Vyakart
Jul 17, 2022, 6:53 AM
1
point
0
comments
1
min read
LW
link
Resolve Cycles
CFAR!Duncan
Jul 16, 2022, 11:17 PM
140
points
8
comments
10
min read
LW
link
Alignment as Game Design
Shoshannah Tekofsky
Jul 16, 2022, 10:36 PM
11
points
7
comments
2
min read
LW
link
Risk Management from a Climbers Perspective
Annapurna
Jul 16, 2022, 9:14 PM
5
points
0
comments
6
min read
LW
link
(jorgevelez.substack.com)
Cognitive Instability, Physicalism, and Free Will
dadadarren
Jul 16, 2022, 1:13 PM
5
points
27
comments
2
min read
LW
link
(www.sleepingbeautyproblem.com)
All AGI safety questions welcome (especially basic ones) [July 2022]
plex
and
Robert Miles
Jul 16, 2022, 12:57 PM
84
points
132
comments
3
min read
LW
link
QNR Prospects
PeterMcCluskey
Jul 16, 2022, 2:03 AM
40
points
3
comments
8
min read
LW
link
(www.bayesianinvestor.com)
To-do waves
Paweł Sysiak
Jul 16, 2022, 1:19 AM
3
points
0
comments
3
min read
LW
link
Moneypumping Bryan Caplan’s Belief in Free Will
Morpheus
Jul 16, 2022, 12:46 AM
5
points
9
comments
1
min read
LW
link
A summary of every “Highlights from the Sequences” post
Orpheus16
Jul 15, 2022, 11:01 PM
98
points
7
comments
17
min read
LW
link
Safety Implications of LeCun’s path to machine intelligence
Ivan Vendrov
Jul 15, 2022, 9:47 PM
102
points
18
comments
6
min read
LW
link
Comfort Zone Exploration
CFAR!Duncan
Jul 15, 2022, 9:18 PM
51
points
2
comments
12
min read
LW
link
A time-invariant version of Laplace’s rule
Jsevillamol
and
Ege Erdil
Jul 15, 2022, 7:28 PM
72
points
13
comments
17
min read
LW
link
(epochai.org)
An attempt to break circularity in science
fryolysis
Jul 15, 2022, 6:32 PM
3
points
5
comments
1
min read
LW
link
A story about a duplicitous API
LiLiLi
Jul 15, 2022, 6:26 PM
2
points
0
comments
1
min read
LW
link
Highlights from the memoirs of Vannevar Bush
jasoncrawford
Jul 15, 2022, 6:08 PM
11
points
0
comments
13
min read
LW
link
(rootsofprogress.org)
Notes on Learning the Prior
carboniferous_umbraculum
Jul 15, 2022, 5:28 PM
25
points
2
comments
25
min read
LW
link
Review of The Engines of Cognition
William Gasarch
Jul 15, 2022, 2:13 PM
14
points
5
comments
15
min read
LW
link
A review of Nate Hilger’s The Parent Trap
David Hugh-Jones
Jul 15, 2022, 9:30 AM
15
points
8
comments
4
min read
LW
link
(wyclif.substack.com)
Musings on the Human Objective Function
Michael Soareverix
Jul 15, 2022, 7:13 AM
3
points
0
comments
3
min read
LW
link
Peter Singer’s first published piece on AI
Fai
Jul 15, 2022, 6:18 AM
20
points
5
comments
1
min read
LW
link
(link.springer.com)
Don’t use ‘infohazard’ for collectively destructive info
Eliezer Yudkowsky
Jul 15, 2022, 5:13 AM
86
points
33
comments
1
min read
LW
link
2
reviews
(www.facebook.com)
Upcoming heatwave: advice
stavros
Jul 15, 2022, 5:03 AM
16
points
13
comments
3
min read
LW
link
A note about differential technological development
So8res
Jul 15, 2022, 4:46 AM
197
points
33
comments
6
min read
LW
link
Inward and outward steelmanning
Q Home
Jul 14, 2022, 11:32 PM
13
points
6
comments
18
min read
LW
link
Potato diet: A post mortem and an answer to SMTM’s article
Épiphanie Gédéon
Jul 14, 2022, 11:18 PM
48
points
34
comments
16
min read
LW
link
Proposed Orthogonality Theses #2-5
rjbg
Jul 14, 2022, 10:59 PM
8
points
0
comments
2
min read
LW
link
Better Quiddler
jefftk
Jul 14, 2022, 5:40 PM
17
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Circumventing interpretability: How to defeat mind-readers
Lee Sharkey
Jul 14, 2022, 4:59 PM
114
points
15
comments
33
min read
LW
link
Covid 7/14/22: BA.2.75 Plus Tax
Zvi
Jul 14, 2022, 2:40 PM
39
points
9
comments
8
min read
LW
link
(thezvi.wordpress.com)
Criticism of EA Criticism Contest
Zvi
Jul 14, 2022, 2:30 PM
108
points
17
comments
31
min read
LW
link
1
review
(thezvi.wordpress.com)
Humans provide an untapped wealth of evidence about alignment
TurnTrout
and
Quintin Pope
Jul 14, 2022, 2:31 AM
212
points
94
comments
9
min read
LW
link
1
review
[Question]
Wacky, risky, anti-inductive intelligence-enhancement methods?
Nicholas / Heather Kross
Jul 14, 2022, 1:40 AM
20
points
30
comments
1
min read
LW
link
[Question]
How to impress students with recent advances in ML?
Charbel-Raphaël
Jul 14, 2022, 12:03 AM
12
points
2
comments
1
min read
LW
link
Notes on Love
David Gross
Jul 13, 2022, 11:35 PM
18
points
3
comments
29
min read
LW
link
Deep learning curriculum for large language model alignment
Jacob_Hilton
Jul 13, 2022, 9:58 PM
57
points
3
comments
1
min read
LW
link
(github.com)
Artificial Sandwiching: When can we test scalable alignment protocols without humans?
Sam Bowman
Jul 13, 2022, 9:14 PM
42
points
6
comments
5
min read
LW
link
[Question]
Any tips for eliciting one’s own latent knowledge?
MSRayne
Jul 13, 2022, 9:12 PM
16
points
20
comments
2
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel