Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Response to Holden’s alignment plan
Alex Flint
Dec 22, 2022, 4:08 PM
36
points
4
comments
6
min read
LW
link
Staring into the abyss as a core life skill
benkuhn
Dec 22, 2022, 3:30 PM
356
points
22
comments
12
min read
LW
link
1
review
(www.benkuhn.net)
Secular Solstice for children
juliawise
and
denkenberger
Dec 22, 2022, 2:33 PM
31
points
1
comment
3
min read
LW
link
Mental acceptance and reflection
remember
and
Gabriel Alfour
Dec 22, 2022, 2:32 PM
34
points
1
comment
2
min read
LW
link
Against Diversification
Jack Malde
Dec 22, 2022, 1:29 PM
4
points
0
comments
3
min read
LW
link
(ethicaleconomist.substack.com)
Notes on Meta’s Diplomacy-Playing AI
Erich_Grunewald
Dec 22, 2022, 11:34 AM
15
points
2
comments
14
min read
LW
link
(www.erichgrunewald.com)
Take 13: RLHF bad, conditioning good.
Charlie Steiner
Dec 22, 2022, 10:44 AM
54
points
4
comments
2
min read
LW
link
Applied Linear Algebra Lecture Series
johnswentworth
Dec 22, 2022, 6:57 AM
103
points
8
comments
1
min read
LW
link
Naive Set Theory, Halmos
David Udell
Dec 22, 2022, 2:34 AM
11
points
1
comment
8
min read
LW
link
Not Getting Hacked
jefftk
Dec 21, 2022, 9:40 PM
40
points
14
comments
7
min read
LW
link
(www.jefftk.com)
Metaphor.systems
the gears to ascension
Dec 21, 2022, 9:31 PM
25
points
9
comments
1
min read
LW
link
(metaphor.systems)
[Question]
How much is DQC (Dynamic Quantum Clustering) currently looked into in AI Capabilities Research?
macmillan
Dec 21, 2022, 8:46 PM
1
point
0
comments
1
min read
LW
link
Think wider about the root causes of progress
jasoncrawford
Dec 21, 2022, 8:05 PM
49
points
11
comments
4
min read
LW
link
(rootsofprogress.org)
[Question]
What readings did you consider best for the happy parts of the secular solstice?
ChristianKl
Dec 21, 2022, 3:45 PM
17
points
0
comments
1
min read
LW
link
Recreating logic in type theory
Thomas Kehrenberg
Dec 21, 2022, 3:19 PM
18
points
0
comments
13
min read
LW
link
You become the UI you use
Viliam
Dec 21, 2022, 3:04 PM
21
points
7
comments
2
min read
LW
link
Price’s equation for neural networks
tailcalled
Dec 21, 2022, 1:09 PM
29
points
4
comments
2
min read
LW
link
Decisions: Ontologically Shifting to Determinism
Chris_Leong
Dec 21, 2022, 12:41 PM
8
points
11
comments
6
min read
LW
link
A Comprehensive Mechanistic Interpretability Explainer & Glossary
Neel Nanda
Dec 21, 2022, 12:35 PM
91
points
6
comments
2
min read
LW
link
(neelnanda.io)
Google Search loses to ChatGPT fair and square
Shmi
Dec 21, 2022, 8:11 AM
14
points
17
comments
1
min read
LW
link
(www.surgehq.ai)
Sazen
Duncan Sabien (Deactivated)
Dec 21, 2022, 7:54 AM
285
points
83
comments
12
min read
LW
link
2
reviews
Podcast: What’s Wrong With LessWrong
Alfred
Dec 21, 2022, 7:06 AM
−32
points
11
comments
1
min read
LW
link
(youtu.be)
New AI risk intro from Vox [link post]
JakubK
Dec 21, 2022, 6:00 AM
5
points
1
comment
2
min read
LW
link
(www.vox.com)
Local Memes Against Geometric Rationality
Scott Garrabrant
Dec 21, 2022, 3:53 AM
90
points
3
comments
6
min read
LW
link
Logging Shell History in Zsh
jefftk
Dec 21, 2022, 3:30 AM
19
points
2
comments
1
min read
LW
link
(www.jefftk.com)
CIRL Corrigibility is Fragile
Rachel Freedman
and
AdamGleave
Dec 21, 2022, 1:40 AM
58
points
8
comments
12
min read
LW
link
[Question]
[DISC] Are Values Robust?
DragonGod
Dec 21, 2022, 1:00 AM
12
points
9
comments
2
min read
LW
link
Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values
Garrett Baker
Dec 21, 2022, 12:44 AM
9
points
10
comments
5
min read
LW
link
Progress links and tweets, 2022-12-20
jasoncrawford
Dec 21, 2022, 12:35 AM
12
points
0
comments
2
min read
LW
link
(rootsofprogress.org)
K-complexity is silly; use cross-entropy instead
So8res
Dec 20, 2022, 11:06 PM
147
points
54
comments
14
min read
LW
link
2
reviews
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Orpheus16
Dec 20, 2022, 9:39 PM
18
points
2
comments
11
min read
LW
link
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
and
Ethan Perez
Dec 20, 2022, 8:08 PM
100
points
34
comments
1
min read
LW
link
(www.anthropic.com)
Reflections: Bureaucratic Hell
Haris Rashid
Dec 20, 2022, 7:22 PM
−5
points
1
comment
1
min read
LW
link
(www.harisrab.com)
Proliferating Education
Haris Rashid
Dec 20, 2022, 7:22 PM
−1
points
2
comments
5
min read
LW
link
(www.harisrab.com)
AGI is here, but nobody wants it. Why should we even care?
MGow
Dec 20, 2022, 7:14 PM
−22
points
0
comments
17
min read
LW
link
Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development
Roman Leventov
Dec 20, 2022, 5:13 PM
33
points
3
comments
36
min read
LW
link
I believe some AI doomers are overconfident
FTPickle
Dec 20, 2022, 5:09 PM
8
points
15
comments
2
min read
LW
link
Note on algorithms with multiple trained components
Steven Byrnes
Dec 20, 2022, 5:08 PM
23
points
4
comments
2
min read
LW
link
Marvel Snap: Phase 2
Zvi
Dec 20, 2022, 2:50 PM
11
points
1
comment
13
min read
LW
link
(thezvi.wordpress.com)
(Extremely) Naive Gradient Hacking Doesn’t Work
ojorgensen
Dec 20, 2022, 2:35 PM
17
points
0
comments
6
min read
LW
link
An Open Agency Architecture for Safe Transformative AI
davidad
Dec 20, 2022, 1:04 PM
80
points
22
comments
4
min read
LW
link
Under-Appreciated Ways to Use Flashcards—Part I
Florence Hinder
Dec 20, 2022, 12:43 PM
22
points
5
comments
5
min read
LW
link
(thoughtsaver.ghost.io)
EA & LW Forums Weekly Summary (12th Dec − 18th Dec 22′)
Zoe Williams
Dec 20, 2022, 9:49 AM
10
points
0
comments
LW
link
[link, 2019] AI paradigm: interactive learning from unlabeled instructions
the gears to ascension
Dec 20, 2022, 6:45 AM
2
points
0
comments
2
min read
LW
link
(jgrizou.github.io)
[Fiction] Unspoken Stone
Gordon Seidoh Worley
Dec 20, 2022, 5:11 AM
19
points
0
comments
5
min read
LW
link
Notice when you stop reading right before you understand
just_browsing
Dec 20, 2022, 5:09 AM
61
points
6
comments
1
min read
LW
link
Take 12: RLHF’s use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
Dec 20, 2022, 5:01 AM
25
points
1
comment
3
min read
LW
link
More notes from raising a late-talking kid
Steven Byrnes
Dec 20, 2022, 2:13 AM
40
points
2
comments
6
min read
LW
link
The “Minimal Latents” Approach to Natural Abstractions
johnswentworth
Dec 20, 2022, 1:22 AM
53
points
24
comments
12
min read
LW
link
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
Dec 19, 2022, 10:52 PM
150
points
30
comments
18
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel