Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Bandwagon effect: Bias in Evaluating AGI X-Risks
Remmelt
and
flandry19
Dec 28, 2022, 7:54 AM
−1
points
0
comments
1
min read
LW
link
Getting up to Speed on the Speed Prior in 2022
robertzk
Dec 28, 2022, 7:49 AM
36
points
5
comments
65
min read
LW
link
[Question]
What does “probability” really mean?
sisyphus
Dec 28, 2022, 3:20 AM
5
points
20
comments
1
min read
LW
link
Zooming the Chrome Audio Player
jefftk
Dec 28, 2022, 2:30 AM
9
points
0
comments
1
min read
LW
link
(www.jefftk.com)
What AI Safety Materials Do ML Researchers Find Compelling?
Vael Gates
and
Collin
Dec 28, 2022, 2:03 AM
175
points
34
comments
2
min read
LW
link
South Bay ACX/LW Meetup
IS
Dec 28, 2022, 1:59 AM
3
points
0
comments
1
min read
LW
link
Regarding Blake Lemoine’s claim that LaMDA is ‘sentient’, he might be right (sorta), but perhaps not for the reasons he thinks
philosophybear
Dec 28, 2022, 1:55 AM
9
points
1
comment
6
min read
LW
link
Fundamental Uncertainty: Chapter 5 - How do we know what we know?
Gordon Seidoh Worley
Dec 28, 2022, 1:28 AM
10
points
2
comments
12
min read
LW
link
Is checking that a state of the world is not dystopian easier than constructing a non-dystopian state?
No77e
Dec 27, 2022, 8:57 PM
5
points
3
comments
1
min read
LW
link
Crypto-currency as pro-alignment mechanism
False Name
Dec 27, 2022, 5:45 PM
−10
points
2
comments
2
min read
LW
link
My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)
Robert_AIZI
Dec 27, 2022, 5:27 PM
50
points
0
comments
4
min read
LW
link
(aizi.substack.com)
Things that can kill you quickly: What everyone should know about first aid
jasoncrawford
Dec 27, 2022, 4:23 PM
166
points
21
comments
2
min read
LW
link
1
review
(jasoncrawford.org)
[Question]
Why The Focus on Expected Utility Maximisers?
DragonGod
Dec 27, 2022, 3:49 PM
118
points
84
comments
3
min read
LW
link
Presumptive Listening: sticking to familiar concepts and missing the outer reasoning paths
Remmelt
Dec 27, 2022, 3:40 PM
−16
points
8
comments
2
min read
LW
link
(mflb.com)
Mere exposure effect: Bias in Evaluating AGI X-Risks
Remmelt
and
flandry19
Dec 27, 2022, 2:05 PM
0
points
2
comments
1
min read
LW
link
Housing and Transportation Roundup #2
Zvi
Dec 27, 2022, 1:10 PM
25
points
0
comments
12
min read
LW
link
(thezvi.wordpress.com)
[Question]
Are tulpas moral patients?
ChristianKl
Dec 27, 2022, 11:30 AM
16
points
28
comments
1
min read
LW
link
Reflections on my 5-month alignment upskilling grant
Jay Bailey
Dec 27, 2022, 10:51 AM
82
points
4
comments
8
min read
LW
link
Institutions Cannot Restrain Dark-Triad AI Exploitation
Remmelt
and
flandry19
Dec 27, 2022, 10:34 AM
5
points
0
comments
5
min read
LW
link
(mflb.com)
Introduction: Bias in Evaluating AGI X-Risks
Remmelt
and
flandry19
Dec 27, 2022, 10:27 AM
1
point
0
comments
3
min read
LW
link
MDPs and the Bellman Equation, Intuitively Explained
Jack O'Brien
Dec 27, 2022, 5:50 AM
11
points
3
comments
14
min read
LW
link
How ‘Human-Human’ dynamics give way to ‘Human-AI’ and then ‘AI-AI’ dynamics
Remmelt
and
flandry19
Dec 27, 2022, 3:16 AM
−2
points
5
comments
2
min read
LW
link
(mflb.com)
Nine Points of Collective Insanity
Remmelt
and
flandry19
Dec 27, 2022, 3:14 AM
−2
points
3
comments
1
min read
LW
link
(mflb.com)
Fractional Resignation
jefftk
Dec 27, 2022, 2:30 AM
19
points
6
comments
1
min read
LW
link
(www.jefftk.com)
[Question]
What policies have most thoroughly crippled (otherwise-promising) industries or technologies?
benwr
Dec 27, 2022, 2:25 AM
40
points
4
comments
1
min read
LW
link
Recent advances in Natural Language Processing—Some Woolly speculations (2019 essay on semantics and language models)
philosophybear
Dec 27, 2022, 2:11 AM
1
point
0
comments
7
min read
LW
link
Against Agents as an Approach to Aligned Transformative AI
DragonGod
Dec 27, 2022, 12:47 AM
12
points
9
comments
2
min read
LW
link
Can we efficiently distinguish different mechanisms?
paulfchristiano
Dec 27, 2022, 12:20 AM
91
points
30
comments
16
min read
LW
link
(ai-alignment.com)
Air-gapping evaluation and support
Ryan Kidd
Dec 26, 2022, 10:52 PM
53
points
1
comment
2
min read
LW
link
Slightly against aligning with neo-luddites
Matthew Barnett
Dec 26, 2022, 10:46 PM
104
points
31
comments
4
min read
LW
link
Avoiding perpetual risk from TAI
scasper
Dec 26, 2022, 10:34 PM
15
points
6
comments
5
min read
LW
link
Announcing: The Independent AI Safety Registry
Shoshannah Tekofsky
Dec 26, 2022, 9:22 PM
53
points
9
comments
1
min read
LW
link
Are men harder to help?
braces
Dec 26, 2022, 9:11 PM
35
points
1
comment
2
min read
LW
link
[Question]
How much should I update on the fact that my dentist is named Dennis?
MichaelDickens
Dec 26, 2022, 7:11 PM
2
points
3
comments
1
min read
LW
link
Theodicy and the simulation hypothesis, or: The problem of simulator evil
philosophybear
Dec 26, 2022, 6:55 PM
12
points
12
comments
19
min read
LW
link
(philosophybear.substack.com)
Safety of Self-Assembled Neuromorphic Hardware
Can
Dec 26, 2022, 6:51 PM
16
points
2
comments
10
min read
LW
link
(forum.effectivealtruism.org)
Coherent extrapolated dreaming
Alex Flint
Dec 26, 2022, 5:29 PM
38
points
10
comments
17
min read
LW
link
An overview of some promising work by junior alignment researchers
Orpheus16
Dec 26, 2022, 5:23 PM
34
points
0
comments
4
min read
LW
link
Solstice song: Here Lies the Dragon
jchan
Dec 26, 2022, 4:08 PM
8
points
1
comment
2
min read
LW
link
The Usefulness Paradigm
Aprillion
Dec 26, 2022, 1:23 PM
4
points
4
comments
1
min read
LW
link
Looking Back on Posts From 2022
Zvi
Dec 26, 2022, 1:20 PM
50
points
8
comments
17
min read
LW
link
(thezvi.wordpress.com)
Analogies between Software Reverse Engineering and Mechanistic Interpretability
Neel Nanda
and
Itay Yona
Dec 26, 2022, 12:26 PM
34
points
6
comments
11
min read
LW
link
(www.neelnanda.io)
Mlyyrczo
lsusr
Dec 26, 2022, 7:58 AM
41
points
14
comments
3
min read
LW
link
Causal abstractions vs infradistributions
Pablo Villalobos
Dec 26, 2022, 12:21 AM
24
points
0
comments
6
min read
LW
link
Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Neel Nanda
Dec 25, 2022, 10:21 PM
57
points
7
comments
12
min read
LW
link
(www.neelnanda.io)
It’s time to worry about online privacy again
Malmesbury
25 Dec 2022 21:05 UTC
68
points
23
comments
6
min read
LW
link
[Hebbian Natural Abstractions] Mathematical Foundations
Samuel Nellessen
and
Jan
25 Dec 2022 20:58 UTC
15
points
2
comments
6
min read
LW
link
(www.snellessen.com)
[Question]
Oracle AGI—How can it escape, other than security issues? (Steganography?)
RationalSieve
25 Dec 2022 20:14 UTC
3
points
6
comments
1
min read
LW
link
YCombinator fraud rates
Xodarap
25 Dec 2022 19:21 UTC
56
points
3
comments
LW
link
How evolutionary lineages of LLMs can plan their own future and act on these plans
Roman Leventov
25 Dec 2022 18:11 UTC
39
points
16
comments
8
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel