Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
[Question]
Will the first AGI agent have been designed as an agent (in addition to an AGI)?
nahoj
Dec 3, 2022, 8:32 PM
1
point
8
comments
1
min read
LW
link
Logical induction for software engineers
Alex Flint
Dec 3, 2022, 7:55 PM
161
points
8
comments
27
min read
LW
link
1
review
Utilitarianism is the only option
aelwood
Dec 3, 2022, 5:14 PM
−13
points
7
comments
1
min read
LW
link
Our 2022 Giving
jefftk
Dec 3, 2022, 3:40 PM
33
points
0
comments
1
min read
LW
link
(www.jefftk.com)
[Question]
Is school good or bad?
tailcalled
Dec 3, 2022, 1:14 PM
10
points
76
comments
1
min read
LW
link
MrBeast’s Squid Game Tricked Me
lsusr
Dec 3, 2022, 5:50 AM
75
points
1
comment
2
min read
LW
link
Great Cryonics Survey of 2022
Mati_Roy
Dec 3, 2022, 5:10 AM
16
points
0
comments
1
min read
LW
link
Causal scrubbing: results on induction heads
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
Dec 3, 2022, 12:59 AM
34
points
1
comment
17
min read
LW
link
Causal scrubbing: results on a paren balance checker
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
Dec 3, 2022, 12:59 AM
34
points
2
comments
30
min read
LW
link
Causal scrubbing: Appendix
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
Dec 3, 2022, 12:58 AM
18
points
4
comments
20
min read
LW
link
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
Dec 3, 2022, 12:58 AM
205
points
35
comments
20
min read
LW
link
1
review
Take 2: Building tools to help build FAI is a legitimate strategy, but it’s dual-use.
Charlie Steiner
Dec 3, 2022, 12:54 AM
17
points
1
comment
2
min read
LW
link
D&D.Sci December 2022: The Boojumologist
abstractapplic
Dec 2, 2022, 11:39 PM
32
points
9
comments
2
min read
LW
link
Subsets and quotients in interpretability
Erik Jenner
Dec 2, 2022, 11:13 PM
26
points
1
comment
7
min read
LW
link
Research Principles for 6 Months of AI Alignment Studies
Shoshannah Tekofsky
Dec 2, 2022, 10:55 PM
23
points
3
comments
6
min read
LW
link
Three Fables of Magical Girls and Longtermism
Ulisse Mini
Dec 2, 2022, 10:01 PM
31
points
11
comments
2
min read
LW
link
Brun’s theorem and sieve theory
Ege Erdil
Dec 2, 2022, 8:57 PM
31
points
1
comment
73
min read
LW
link
Apply for the ML Upskilling Winter Camp in Cambridge, UK [2-10 Jan]
hannah wing-yee
Dec 2, 2022, 8:45 PM
3
points
0
comments
2
min read
LW
link
Takeoff speeds, the chimps analogy, and the Cultural Intelligence Hypothesis
NickGabs
Dec 2, 2022, 7:14 PM
16
points
2
comments
4
min read
LW
link
[ASoT] Finetuning, RL, and GPT’s world prior
Jozdien
Dec 2, 2022, 4:33 PM
44
points
8
comments
5
min read
LW
link
NeurIPS Safety & ChatGPT. MLAISU W48
Esben Kran
and
Steinthal
Dec 2, 2022, 3:50 PM
3
points
0
comments
4
min read
LW
link
(newsletter.apartresearch.com)
[Question]
Is ChatGPT rigth when advising to brush the tongue when brushing teeth?
ChristianKl
Dec 2, 2022, 2:53 PM
13
points
14
comments
2
min read
LW
link
Jailbreaking ChatGPT on Release Day
Zvi
Dec 2, 2022, 1:10 PM
242
points
77
comments
6
min read
LW
link
1
review
(thezvi.wordpress.com)
Deconfusing Direct vs Amortised Optimization
beren
Dec 2, 2022, 11:30 AM
134
points
19
comments
10
min read
LW
link
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
Dec 2, 2022, 2:43 AM
148
points
22
comments
47
min read
LW
link
3
reviews
New Feature: Collaborative editing now supports logged-out users
RobertM
Dec 2, 2022, 2:41 AM
10
points
0
comments
1
min read
LW
link
Mastering Stratego (Deepmind)
svemirski
Dec 2, 2022, 2:21 AM
6
points
0
comments
1
min read
LW
link
(www.deepmind.com)
Update on Harvard AI Safety Team and MIT AI Alignment
Xander Davies
,
Sam Marks
,
kaivu
,
tlevin
,
eleni
,
maxnadeau
and
Naomi Bashkansky
Dec 2, 2022, 12:56 AM
60
points
4
comments
8
min read
LW
link
Quick look: cognitive damage from well-administered anesthesia
Elizabeth
Dec 2, 2022, 12:40 AM
28
points
0
comments
4
min read
LW
link
(acesounderglass.com)
Against meta-ethical hedonism
Joe Carlsmith
Dec 2, 2022, 12:23 AM
24
points
4
comments
35
min read
LW
link
Lumenators for very lazy British people
shakeelh
Dec 2, 2022, 12:18 AM
16
points
3
comments
1
min read
LW
link
Understanding goals in complex systems
Johannes C. Mayer
Dec 1, 2022, 11:49 PM
9
points
0
comments
1
min read
LW
link
(www.youtube.com)
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
and
Eliezer Yudkowsky
Dec 1, 2022, 11:11 PM
301
points
33
comments
2
min read
LW
link
Playing with Aerial Photos
jefftk
Dec 1, 2022, 10:50 PM
9
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Take 1: We’re not going to reverse-engineer the AI.
Charlie Steiner
Dec 1, 2022, 10:41 PM
38
points
4
comments
4
min read
LW
link
Re-Examining LayerNorm
Eric Winsor
Dec 1, 2022, 10:20 PM
127
points
12
comments
5
min read
LW
link
The LessWrong 2021 Review: Intellectual Circle Expansion
Ruby
and
Raemon
Dec 1, 2022, 9:17 PM
95
points
55
comments
8
min read
LW
link
The Plan − 2022 Update
johnswentworth
Dec 1, 2022, 8:43 PM
239
points
37
comments
8
min read
LW
link
1
review
Finding gliders in the game of life
paulfchristiano
Dec 1, 2022, 8:40 PM
104
points
8
comments
16
min read
LW
link
(ai-alignment.com)
The Machine Stops (Chapter 9)
Justin Bullock
Dec 1, 2022, 7:20 PM
3
points
0
comments
47
min read
LW
link
Covid 12/1/22: China Protests
Zvi
Dec 1, 2022, 5:10 PM
38
points
2
comments
10
min read
LW
link
(thezvi.wordpress.com)
ChatGPT: First Impressions
specbug
Dec 1, 2022, 4:36 PM
18
points
2
comments
13
min read
LW
link
(sixeleven.in)
[LINK] - ChatGPT discussion
JanB
Dec 1, 2022, 3:04 PM
13
points
8
comments
1
min read
LW
link
(openai.com)
Research request (alignment strategy): Deep dive on “making AI solve alignment for us”
JanB
Dec 1, 2022, 2:55 PM
16
points
3
comments
1
min read
LW
link
Theories of impact for Science of Deep Learning
Marius Hobbhahn
Dec 1, 2022, 2:39 PM
24
points
0
comments
11
min read
LW
link
Safe Development of Hacker-AI Countermeasures – What if we are too late?
Erland Wittkotter
Dec 1, 2022, 7:59 AM
3
points
0
comments
14
min read
LW
link
Did ChatGPT just gaslight me?
TW123
1 Dec 2022 5:41 UTC
123
points
45
comments
9
min read
LW
link
(aiwatchtower.substack.com)
SBF’s comments on ethics are no surprise to virtue ethicists
c.trout
1 Dec 2022 4:18 UTC
36
points
30
comments
16
min read
LW
link
Notes on Caution
David Gross
1 Dec 2022 3:05 UTC
14
points
0
comments
19
min read
LW
link
Reestablishing Reliable Sources: A System for Tagging URLs
Riley Mueller
1 Dec 2022 2:27 UTC
7
points
1
comment
3
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel