Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
ChatGPT and Ideological Turing Test
Viliam
Dec 5, 2022, 9:45 PM
42
points
1
comment
1
min read
LW
link
ChatGPT on Spielberg’s A.I. and AI Alignment
Bill Benzon
Dec 5, 2022, 9:10 PM
5
points
0
comments
4
min read
LW
link
Updating my AI timelines
Matthew Barnett
Dec 5, 2022, 8:46 PM
145
points
50
comments
2
min read
LW
link
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Evan R. Murphy
and
Megan Kinniment
Dec 5, 2022, 8:28 PM
40
points
19
comments
10
min read
LW
link
College Admissions as a Brutal One-Shot Game
devansh
Dec 5, 2022, 8:05 PM
8
points
26
comments
2
min read
LW
link
Analysis of AI Safety surveys for field-building insights
Ash Jafari
Dec 5, 2022, 7:21 PM
11
points
2
comments
5
min read
LW
link
Testing Ways to Bypass ChatGPT’s Safety Features
Robert_AIZI
Dec 5, 2022, 6:50 PM
7
points
4
comments
5
min read
LW
link
(aizi.substack.com)
Foresight for AGI Safety Strategy: Mitigating Risks and Identifying Golden Opportunities
jacquesthibs
Dec 5, 2022, 4:09 PM
28
points
6
comments
8
min read
LW
link
Aligned Behavior is not Evidence of Alignment Past a Certain Level of Intelligence
Ronny Fernandez
Dec 5, 2022, 3:19 PM
19
points
5
comments
7
min read
LW
link
[Question]
How should I judge the impact of giving $5k to a family of three kids and two mentally ill parents?
Blake
Dec 5, 2022, 1:42 PM
10
points
10
comments
1
min read
LW
link
Is the “Valley of Confused Abstractions” real?
jacquesthibs
Dec 5, 2022, 1:36 PM
20
points
11
comments
2
min read
LW
link
Take 4: One problem with natural abstractions is there’s too many of them.
Charlie Steiner
Dec 5, 2022, 10:39 AM
37
points
4
comments
1
min read
LW
link
[Question]
What are some good Lesswrong-related accounts or hashtags on Mastodon that I should follow?
SpectrumDT
Dec 5, 2022, 9:42 AM
2
points
0
comments
1
min read
LW
link
[Question]
Who are some prominent reasonable people who are confident that AI won’t kill everyone?
Optimization Process
Dec 5, 2022, 9:12 AM
72
points
54
comments
1
min read
LW
link
Monthly Shorts 11/22
Celer
Dec 5, 2022, 7:30 AM
8
points
0
comments
3
min read
LW
link
(keller.substack.com)
A ChatGPT story about ChatGPT doom
SurfingOrca
Dec 5, 2022, 5:40 AM
6
points
2
comments
4
min read
LW
link
A Tentative Timeline of The Near Future (2022-2025) for Self-Accountability
Yitz
Dec 5, 2022, 5:33 AM
26
points
0
comments
4
min read
LW
link
Nook Nature
Duncan Sabien (Deactivated)
Dec 5, 2022, 4:10 AM
54
points
18
comments
10
min read
LW
link
Probably good projects for the AI safety ecosystem
Ryan Kidd
Dec 5, 2022, 2:26 AM
78
points
40
comments
2
min read
LW
link
Historical Notes on Charitable Funds
jefftk
Dec 4, 2022, 11:30 PM
28
points
0
comments
3
min read
LW
link
(www.jefftk.com)
AGI as a Black Swan Event
Stephen McAleese
Dec 4, 2022, 11:00 PM
8
points
8
comments
7
min read
LW
link
South Bay ACX/LW Pre-Holiday Get-Together
IS
Dec 4, 2022, 10:57 PM
10
points
0
comments
1
min read
LW
link
ChatGPT is settling the Chinese Room argument
averros
Dec 4, 2022, 8:25 PM
−7
points
7
comments
1
min read
LW
link
Race to the Top: Benchmarks for AI Safety
Isabella Duan
Dec 4, 2022, 6:48 PM
29
points
6
comments
1
min read
LW
link
Open & Welcome Thread—December 2022
niplav
Dec 4, 2022, 3:06 PM
8
points
22
comments
1
min read
LW
link
AI can exploit safety plans posted on the Internet
Peter S. Park
Dec 4, 2022, 12:17 PM
−15
points
4
comments
LW
link
ChatGPT seems overconfident to me
qbolec
Dec 4, 2022, 8:03 AM
19
points
3
comments
16
min read
LW
link
Could an AI be Religious?
mk54
Dec 4, 2022, 5:00 AM
−12
points
14
comments
1
min read
LW
link
Can GPT-3 Write Contra Dances?
jefftk
Dec 4, 2022, 3:00 AM
6
points
4
comments
10
min read
LW
link
(www.jefftk.com)
Take 3: No indescribable heavenworlds.
Charlie Steiner
Dec 4, 2022, 2:48 AM
23
points
12
comments
2
min read
LW
link
Summary of a new study on out-group hate (and how to fix it)
DirectedEvolution
Dec 4, 2022, 1:53 AM
60
points
30
comments
3
min read
LW
link
(www.pnas.org)
[Question]
Will the first AGI agent have been designed as an agent (in addition to an AGI)?
nahoj
Dec 3, 2022, 8:32 PM
1
point
8
comments
1
min read
LW
link
Logical induction for software engineers
Alex Flint
Dec 3, 2022, 7:55 PM
163
points
8
comments
27
min read
LW
link
1
review
Utilitarianism is the only option
aelwood
Dec 3, 2022, 5:14 PM
−13
points
7
comments
LW
link
Our 2022 Giving
jefftk
Dec 3, 2022, 3:40 PM
33
points
0
comments
1
min read
LW
link
(www.jefftk.com)
[Question]
Is school good or bad?
tailcalled
Dec 3, 2022, 1:14 PM
10
points
76
comments
1
min read
LW
link
MrBeast’s Squid Game Tricked Me
lsusr
Dec 3, 2022, 5:50 AM
75
points
1
comment
2
min read
LW
link
Great Cryonics Survey of 2022
Mati_Roy
Dec 3, 2022, 5:10 AM
16
points
0
comments
1
min read
LW
link
Causal scrubbing: results on induction heads
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
Dec 3, 2022, 12:59 AM
34
points
1
comment
17
min read
LW
link
Causal scrubbing: results on a paren balance checker
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
Tao Lin
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
Dec 3, 2022, 12:59 AM
34
points
2
comments
30
min read
LW
link
Causal scrubbing: Appendix
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
Dec 3, 2022, 12:58 AM
18
points
4
comments
20
min read
LW
link
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
,
Adrià Garriga-alonso
,
Nicholas Goldowsky-Dill
,
ryan_greenblatt
,
jenny
,
Ansh Radhakrishnan
,
Buck
and
Nate Thomas
Dec 3, 2022, 12:58 AM
206
points
35
comments
20
min read
LW
link
1
review
Take 2: Building tools to help build FAI is a legitimate strategy, but it’s dual-use.
Charlie Steiner
Dec 3, 2022, 12:54 AM
17
points
1
comment
2
min read
LW
link
D&D.Sci December 2022: The Boojumologist
abstractapplic
Dec 2, 2022, 11:39 PM
32
points
9
comments
2
min read
LW
link
Subsets and quotients in interpretability
Erik Jenner
Dec 2, 2022, 11:13 PM
26
points
1
comment
7
min read
LW
link
Research Principles for 6 Months of AI Alignment Studies
Shoshannah Tekofsky
Dec 2, 2022, 10:55 PM
23
points
3
comments
6
min read
LW
link
Three Fables of Magical Girls and Longtermism
Ulisse Mini
Dec 2, 2022, 10:01 PM
33
points
11
comments
2
min read
LW
link
Brun’s theorem and sieve theory
Ege Erdil
Dec 2, 2022, 8:57 PM
31
points
1
comment
73
min read
LW
link
Apply for the ML Upskilling Winter Camp in Cambridge, UK [2-10 Jan]
hannah wing-yee
Dec 2, 2022, 8:45 PM
3
points
0
comments
2
min read
LW
link
Takeoff speeds, the chimps analogy, and the Cultural Intelligence Hypothesis
NickGabs
Dec 2, 2022, 7:14 PM
16
points
2
comments
4
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel