Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
The Talk: a brief explanation of sexual dimorphism
Malmesbury
Sep 18, 2023, 4:23 PM
520
points
75
comments
16
min read
LW
link
3
reviews
Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth
Sep 25, 2023, 4:08 PM
335
points
53
comments
5
min read
LW
link
Sharing Information About Nonlinear
Ben Pace
Sep 7, 2023, 6:51 AM
323
points
323
comments
34
min read
LW
link
EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem
Elizabeth
Sep 28, 2023, 11:30 PM
317
points
250
comments
22
min read
LW
link
2
reviews
(acesounderglass.com)
Sum-threshold attacks
TsviBT
Sep 8, 2023, 5:13 PM
238
points
55
comments
10
min read
LW
link
(tsvibt.blogspot.com)
What I would do if I wasn’t at ARC Evals
LawrenceC
Sep 5, 2023, 7:19 PM
220
points
10
comments
13
min read
LW
link
1
review
UDT shows that decision theory is more puzzling than ever
Wei Dai
Sep 13, 2023, 12:26 PM
218
points
56
comments
1
min read
LW
link
AI presidents discuss AI alignment agendas
TurnTrout
and
Garrett Baker
Sep 9, 2023, 6:55 PM
217
points
23
comments
1
min read
LW
link
(www.youtube.com)
The King and the Golem
Richard_Ngo
Sep 25, 2023, 7:51 PM
190
points
19
comments
5
min read
LW
link
1
review
(narrativeark.substack.com)
A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX
Bird Concept
Sep 1, 2023, 4:03 AM
188
points
26
comments
24
min read
LW
link
1
review
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB
,
Owain_Evans
and
SoerenMind
Sep 28, 2023, 6:53 PM
187
points
39
comments
3
min read
LW
link
1
review
There should be more AI safety orgs
Marius Hobbhahn
Sep 21, 2023, 2:53 PM
181
points
25
comments
17
min read
LW
link
Defunding My Mistake
ymeskhout
Sep 4, 2023, 2:43 PM
175
points
41
comments
6
min read
LW
link
Meta Questions about Metaphilosophy
Wei Dai
Sep 1, 2023, 1:17 AM
161
points
80
comments
3
min read
LW
link
“Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
titotal
Sep 29, 2023, 2:01 PM
160
points
79
comments
LW
link
(titotal.substack.com)
Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs
,
Hoagy
,
Aidan Ewart
and
Robert_AIZI
Sep 21, 2023, 3:30 PM
159
points
8
comments
5
min read
LW
link
Cohabitive Games so Far
mako yass
Sep 28, 2023, 3:41 PM
131
points
146
comments
19
min read
LW
link
2
reviews
(makopool.com)
One Minute Every Moment
abramdemski
Sep 1, 2023, 8:23 PM
125
points
23
comments
3
min read
LW
link
The smallest possible button (or: moth traps!)
Neil
Sep 2, 2023, 3:24 PM
122
points
18
comments
3
min read
LW
link
(neilwarren.substack.com)
Paper: LLMs trained on “A is B” fail to learn “B is A”
lberglund
,
Owain_Evans
,
Meg
,
Maximilian Kaufmann
,
Mikita Balesni
,
Asa Cooper Stickland
and
Tomek Korbak
Sep 23, 2023, 7:55 PM
121
points
74
comments
4
min read
LW
link
(arxiv.org)
Making AIs less likely to be spiteful
Nicolas Macé
,
Anthony DiGiovanni
and
JesseClifton
Sep 26, 2023, 2:12 PM
118
points
7
comments
10
min read
LW
link
Interpreting OpenAI’s Whisper
EllenaR
Sep 24, 2023, 5:53 PM
116
points
13
comments
7
min read
LW
link
“X distracts from Y” as a thinly-disguised fight over group status / politics
Steven Byrnes
Sep 25, 2023, 3:18 PM
112
points
14
comments
8
min read
LW
link
Paper: On measuring situational awareness in LLMs
Owain_Evans
,
Daniel Kokotajlo
,
Mikita Balesni
,
Tomek Korbak
,
Asa Cooper Stickland
,
Meg
and
Maximilian Kaufmann
Sep 4, 2023, 12:54 PM
109
points
16
comments
5
min read
LW
link
(arxiv.org)
ActAdd: Steering Language Models without Optimization
technicalities
,
TurnTrout
,
lisathiergart
,
David Udell
,
Ulisse Mini
and
Monte M
Sep 6, 2023, 5:21 PM
105
points
3
comments
2
min read
LW
link
(arxiv.org)
PSA: The community is in Berkeley/Oakland, not “the Bay Area”
maia
Sep 11, 2023, 3:59 PM
104
points
7
comments
1
min read
LW
link
Reproducing ARC Evals’ recent report on language model agents
Thomas Broadley
Sep 1, 2023, 4:52 PM
104
points
17
comments
3
min read
LW
link
(thomasbroadley.com)
Explaining grokking through circuit efficiency
Vikrant Varma
and
Rohin Shah
Sep 8, 2023, 2:39 PM
101
points
11
comments
3
min read
LW
link
(arxiv.org)
Would You Work Harder In The Least Convenient Possible World?
Firinn
Sep 22, 2023, 5:17 AM
99
points
100
comments
9
min read
LW
link
2
reviews
Closing Notes on Nonlinear Investigation
Ben Pace
Sep 15, 2023, 10:44 PM
97
points
47
comments
11
min read
LW
link
Atoms to Agents Proto-Lectures
johnswentworth
Sep 22, 2023, 6:22 AM
96
points
14
comments
2
min read
LW
link
(www.youtube.com)
Announcing FAR Labs, an AI safety coworking space
Ben Goldhaber
Sep 29, 2023, 4:52 PM
95
points
0
comments
1
min read
LW
link
Logical Share Splitting
DaemonicSigil
Sep 11, 2023, 4:08 AM
93
points
16
comments
9
min read
LW
link
(pbement.com)
I compiled a ebook of `Project Lawful` for eBook readers
OrwellGoesShopping
Sep 15, 2023, 6:09 PM
90
points
4
comments
1
min read
LW
link
(www.mikescher.com)
AI #31: It Can Do What Now?
Zvi
Sep 28, 2023, 4:00 PM
90
points
6
comments
40
min read
LW
link
(thezvi.wordpress.com)
Benchmarks for Detecting Measurement Tampering [Redwood Research]
ryan_greenblatt
and
Fabien Roger
Sep 5, 2023, 4:44 PM
87
points
22
comments
20
min read
LW
link
1
review
(arxiv.org)
Highlights: Wentworth, Shah, and Murphy on “Retargeting the Search”
RobertM
Sep 14, 2023, 2:18 AM
87
points
4
comments
8
min read
LW
link
Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds
Sep 19, 2023, 3:09 PM
83
points
26
comments
3
min read
LW
link
1
review
(www.anthropic.com)
[Question]
How have you become more hard-working?
Chi Nguyen
Sep 25, 2023, 12:37 PM
82
points
42
comments
LW
link
Memory bandwidth constraints imply economies of scale in AI inference
Ege Erdil
Sep 17, 2023, 2:01 PM
79
points
34
comments
4
min read
LW
link
Navigating an ecosystem that might or might not be bad for the world
habryka
and
kave
Sep 15, 2023, 11:58 PM
79
points
20
comments
1
min read
LW
link
Find Hot French Food Near Me: A Follow-up
aphyer
Sep 6, 2023, 12:32 PM
75
points
19
comments
2
min read
LW
link
Luck based medicine: angry eldritch sugar gods edition
Elizabeth
Sep 19, 2023, 4:40 AM
75
points
14
comments
9
min read
LW
link
(acesounderglass.com)
Text Posts from the Kids Group: 2023 I
jefftk
Sep 5, 2023, 2:00 AM
75
points
3
comments
7
min read
LW
link
(www.jefftk.com)
AI #30: Dalle-3 and GPT-3.5-Instruct-Turbo
Zvi
Sep 21, 2023, 12:00 PM
75
points
8
comments
47
min read
LW
link
(thezvi.wordpress.com)
[Question]
How to talk about reasons why AGI might not be near?
Kaj_Sotala
Sep 17, 2023, 8:18 AM
73
points
19
comments
2
min read
LW
link
High-level interpretability: detecting an AI’s objectives
Paul Colognese
and
Jozdien
Sep 28, 2023, 7:30 PM
72
points
4
comments
21
min read
LW
link
A quick update from Nonlinear
KatWoods
Sep 7, 2023, 9:28 PM
72
points
23
comments
2
min read
LW
link
Influence functions—why, what and how
Nina Panickssery
Sep 15, 2023, 8:42 PM
71
points
6
comments
8
min read
LW
link
Have Attention Spans Been Declining?
niplav
Sep 8, 2023, 2:11 PM
71
points
22
comments
17
min read
LW
link
1
review
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel