Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
2
Auditing games for high-level interpretability
Paul Colognese
Nov 1, 2022, 10:44 AM
33
points
1
comment
7
min read
LW
link
Remember to translate your thoughts back again
brook
Nov 1, 2022, 8:49 AM
25
points
11
comments
3
min read
LW
link
(forum.effectivealtruism.org)
Conversations on Alcohol Consumption
Annapurna
Nov 1, 2022, 5:09 AM
20
points
6
comments
9
min read
LW
link
ML Safety Scholars Summer 2022 Retrospective
TW123
Nov 1, 2022, 3:09 AM
29
points
0
comments
LW
link
EA & LW Forums Weekly Summary (24 − 30th Oct 22′)
Zoe Williams
Nov 1, 2022, 2:58 AM
13
points
1
comment
LW
link
Caution when interpreting Deepmind’s In-context RL paper
Sam Marks
Nov 1, 2022, 2:42 AM
105
points
8
comments
4
min read
LW
link
What sorts of systems can be deceptive?
Andrei Alexandru
Oct 31, 2022, 10:00 PM
16
points
0
comments
7
min read
LW
link
“Cars and Elephants”: a handwavy argument/analogy against mechanistic interpretability
David Scott Krueger (formerly: capybaralet)
Oct 31, 2022, 9:26 PM
51
points
25
comments
2
min read
LW
link
Superintelligent AI is necessary for an amazing future, but far from sufficient
So8res
Oct 31, 2022, 9:16 PM
132
points
48
comments
34
min read
LW
link
Sanity-checking in an age of hyperbole
Ciprian Elliu Ivanof
Oct 31, 2022, 8:04 PM
2
points
4
comments
2
min read
LW
link
Why Aren’t There More Schelling Holidays?
johnswentworth
Oct 31, 2022, 7:31 PM
63
points
21
comments
1
min read
LW
link
The circular problem of epistemic irresponsibility
Roman Leventov
Oct 31, 2022, 5:23 PM
5
points
2
comments
8
min read
LW
link
AI as a Civilizational Risk Part 3/6: Anti-economy and Signal Pollution
PashaKamyshev
Oct 31, 2022, 5:03 PM
7
points
4
comments
14
min read
LW
link
Average utilitarianism is non-local
Yair Halberstadt
Oct 31, 2022, 4:36 PM
29
points
13
comments
1
min read
LW
link
Marvel Snap: Phase 1
Zvi
Oct 31, 2022, 3:20 PM
23
points
1
comment
14
min read
LW
link
(thezvi.wordpress.com)
Boundaries vs Frames
Scott Garrabrant
Oct 31, 2022, 3:14 PM
58
points
10
comments
7
min read
LW
link
Embedding safety in ML development
zeshen
Oct 31, 2022, 12:27 PM
24
points
1
comment
18
min read
LW
link
[Book] Interpretable Machine Learning: A Guide for Making Black Box Models Explainable
Esben Kran
Oct 31, 2022, 11:38 AM
20
points
1
comment
1
min read
LW
link
(christophm.github.io)
My (naive) take on Risks from Learned Optimization
Artyom Karpov
Oct 31, 2022, 10:59 AM
7
points
0
comments
5
min read
LW
link
Tactical Nuclear Weapons Aren’t Cost-Effective Compared to Precision Artillery
Lao Mein
Oct 31, 2022, 4:33 AM
28
points
7
comments
3
min read
LW
link
Gandalf or Saruman? A Soldier in Scout’s Clothing
DirectedEvolution
Oct 31, 2022, 2:40 AM
41
points
1
comment
4
min read
LW
link
Me (Steve Byrnes) on the “Brain Inspired” podcast
Steven Byrnes
Oct 30, 2022, 7:15 PM
26
points
1
comment
1
min read
LW
link
(braininspired.co)
“Normal” is the equilibrium state of past optimization processes
Alex_Altair
Oct 30, 2022, 7:03 PM
82
points
5
comments
5
min read
LW
link
AI as a Civilizational Risk Part 2/6: Behavioral Modification
PashaKamyshev
Oct 30, 2022, 4:57 PM
9
points
0
comments
10
min read
LW
link
Instrumental ignoring AI, Dumb but not useless.
Donald Hobson
Oct 30, 2022, 4:55 PM
7
points
6
comments
2
min read
LW
link
Weekly Roundup #3
Zvi
Oct 30, 2022, 12:20 PM
23
points
5
comments
15
min read
LW
link
(thezvi.wordpress.com)
Quickly refactoring the U.S. Constitution
lc
Oct 30, 2022, 7:17 AM
7
points
25
comments
4
min read
LW
link
«Boundaries», Part 3a: Defining boundaries as directed Markov blankets
Andrew_Critch
Oct 30, 2022, 6:31 AM
90
points
20
comments
15
min read
LW
link
Am I secretly excited for AI getting weird?
porby
Oct 29, 2022, 10:16 PM
116
points
4
comments
4
min read
LW
link
AI as a Civilizational Risk Part 1/6: Historical Priors
PashaKamyshev
Oct 29, 2022, 9:59 PM
2
points
2
comments
7
min read
LW
link
Don’t expect your life partner to be better than your exes in more than one way: a mathematical model
mdd
Oct 29, 2022, 6:47 PM
7
points
1
comment
9
min read
LW
link
The Social Recession: By the Numbers
antonomon
Oct 29, 2022, 6:45 PM
165
points
29
comments
8
min read
LW
link
(novum.substack.com)
Electric Kettle vs Stove
jefftk
Oct 29, 2022, 12:50 PM
18
points
7
comments
1
min read
LW
link
(www.jefftk.com)
Quantum Immortality, foiled
Ben
Oct 29, 2022, 11:00 AM
27
points
4
comments
2
min read
LW
link
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
RowanWang
,
Alexandre Variengien
,
Arthur Conmy
,
Buck
and
jsteinhardt
Oct 28, 2022, 11:55 PM
101
points
9
comments
9
min read
LW
link
2
reviews
(arxiv.org)
Resources that (I think) new alignment researchers should know about
Orpheus16
Oct 28, 2022, 10:13 PM
70
points
9
comments
4
min read
LW
link
How often does One Person succeed?
Mayank Modi
Oct 28, 2022, 7:32 PM
1
point
3
comments
LW
link
aisafety.community—A living document of AI safety communities
zeshen
and
plex
Oct 28, 2022, 5:50 PM
58
points
23
comments
1
min read
LW
link
Rapid Test Throat Swabbing?
jefftk
Oct 28, 2022, 4:30 PM
18
points
2
comments
1
min read
LW
link
(www.jefftk.com)
Join the interpretability research hackathon
Esben Kran
Oct 28, 2022, 4:26 PM
15
points
0
comments
LW
link
Syncretism
Annapurna
Oct 28, 2022, 4:08 PM
16
points
4
comments
1
min read
LW
link
(jorgevelez.substack.com)
Pondering computation in the real world
Adam Shai
Oct 28, 2022, 3:57 PM
24
points
13
comments
5
min read
LW
link
Ukraine and the Crimea Question
ChristianKl
Oct 28, 2022, 12:26 PM
−2
points
153
comments
11
min read
LW
link
New book on s-risks
Tobias_Baumann
Oct 28, 2022, 9:36 AM
68
points
1
comment
LW
link
Cryptic symbols
Adam Scherlis
Oct 28, 2022, 6:44 AM
6
points
17
comments
1
min read
LW
link
(adam.scherlis.com)
All life’s helpers’ beliefs
Tehdastehdas
Oct 28, 2022, 5:47 AM
−12
points
1
comment
5
min read
LW
link
Prizes for ML Safety Benchmark Ideas
joshc
Oct 28, 2022, 2:51 AM
36
points
5
comments
1
min read
LW
link
Worldview iPeople—Future Fund’s AI Worldview Prize
Toni MUENDEL
Oct 28, 2022, 1:53 AM
−22
points
4
comments
9
min read
LW
link
Anatomy of change
Jose Miguel Cruz y Celis
Oct 28, 2022, 1:21 AM
1
point
0
comments
1
min read
LW
link
Nash equilibria of symmetric zero-sum games
Ege Erdil
Oct 27, 2022, 11:50 PM
14
points
0
comments
14
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel