Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Recreating logic in type theory
Thomas Kehrenberg
Dec 21, 2022, 3:19 PM
18
points
0
comments
13
min read
LW
link
You become the UI you use
Viliam
Dec 21, 2022, 3:04 PM
21
points
7
comments
2
min read
LW
link
Price’s equation for neural networks
tailcalled
Dec 21, 2022, 1:09 PM
29
points
4
comments
2
min read
LW
link
Decisions: Ontologically Shifting to Determinism
Chris_Leong
Dec 21, 2022, 12:41 PM
8
points
11
comments
6
min read
LW
link
A Comprehensive Mechanistic Interpretability Explainer & Glossary
Neel Nanda
Dec 21, 2022, 12:35 PM
91
points
6
comments
2
min read
LW
link
(neelnanda.io)
Google Search loses to ChatGPT fair and square
Shmi
Dec 21, 2022, 8:11 AM
14
points
17
comments
1
min read
LW
link
(www.surgehq.ai)
Sazen
Duncan Sabien (Deactivated)
Dec 21, 2022, 7:54 AM
285
points
83
comments
12
min read
LW
link
2
reviews
Podcast: What’s Wrong With LessWrong
Alfred
Dec 21, 2022, 7:06 AM
−32
points
11
comments
1
min read
LW
link
(youtu.be)
New AI risk intro from Vox [link post]
JakubK
Dec 21, 2022, 6:00 AM
5
points
1
comment
2
min read
LW
link
(www.vox.com)
Local Memes Against Geometric Rationality
Scott Garrabrant
Dec 21, 2022, 3:53 AM
90
points
3
comments
6
min read
LW
link
Logging Shell History in Zsh
jefftk
Dec 21, 2022, 3:30 AM
19
points
2
comments
1
min read
LW
link
(www.jefftk.com)
CIRL Corrigibility is Fragile
Rachel Freedman
and
AdamGleave
Dec 21, 2022, 1:40 AM
58
points
8
comments
12
min read
LW
link
[Question]
[DISC] Are Values Robust?
DragonGod
Dec 21, 2022, 1:00 AM
12
points
9
comments
2
min read
LW
link
Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values
Garrett Baker
Dec 21, 2022, 12:44 AM
9
points
10
comments
5
min read
LW
link
Progress links and tweets, 2022-12-20
jasoncrawford
Dec 21, 2022, 12:35 AM
12
points
0
comments
2
min read
LW
link
(rootsofprogress.org)
K-complexity is silly; use cross-entropy instead
So8res
Dec 20, 2022, 11:06 PM
147
points
54
comments
14
min read
LW
link
2
reviews
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Orpheus16
Dec 20, 2022, 9:39 PM
18
points
2
comments
11
min read
LW
link
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
and
Ethan Perez
Dec 20, 2022, 8:08 PM
100
points
34
comments
1
min read
LW
link
(www.anthropic.com)
Reflections: Bureaucratic Hell
Haris Rashid
Dec 20, 2022, 7:22 PM
−5
points
1
comment
1
min read
LW
link
(www.harisrab.com)
Proliferating Education
Haris Rashid
Dec 20, 2022, 7:22 PM
−1
points
2
comments
5
min read
LW
link
(www.harisrab.com)
AGI is here, but nobody wants it. Why should we even care?
MGow
Dec 20, 2022, 7:14 PM
−22
points
0
comments
17
min read
LW
link
Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development
Roman Leventov
Dec 20, 2022, 5:13 PM
33
points
3
comments
36
min read
LW
link
I believe some AI doomers are overconfident
FTPickle
Dec 20, 2022, 5:09 PM
8
points
15
comments
2
min read
LW
link
Note on algorithms with multiple trained components
Steven Byrnes
Dec 20, 2022, 5:08 PM
23
points
4
comments
2
min read
LW
link
Marvel Snap: Phase 2
Zvi
Dec 20, 2022, 2:50 PM
11
points
1
comment
13
min read
LW
link
(thezvi.wordpress.com)
(Extremely) Naive Gradient Hacking Doesn’t Work
ojorgensen
Dec 20, 2022, 2:35 PM
17
points
0
comments
6
min read
LW
link
An Open Agency Architecture for Safe Transformative AI
davidad
Dec 20, 2022, 1:04 PM
80
points
22
comments
4
min read
LW
link
Under-Appreciated Ways to Use Flashcards—Part I
Florence Hinder
Dec 20, 2022, 12:43 PM
22
points
5
comments
5
min read
LW
link
(thoughtsaver.ghost.io)
EA & LW Forums Weekly Summary (12th Dec − 18th Dec 22′)
Zoe Williams
Dec 20, 2022, 9:49 AM
10
points
0
comments
LW
link
[link, 2019] AI paradigm: interactive learning from unlabeled instructions
the gears to ascension
Dec 20, 2022, 6:45 AM
2
points
0
comments
2
min read
LW
link
(jgrizou.github.io)
[Fiction] Unspoken Stone
Gordon Seidoh Worley
Dec 20, 2022, 5:11 AM
19
points
0
comments
5
min read
LW
link
Notice when you stop reading right before you understand
just_browsing
Dec 20, 2022, 5:09 AM
61
points
6
comments
1
min read
LW
link
Take 12: RLHF’s use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
Dec 20, 2022, 5:01 AM
25
points
1
comment
3
min read
LW
link
More notes from raising a late-talking kid
Steven Byrnes
Dec 20, 2022, 2:13 AM
40
points
2
comments
6
min read
LW
link
The “Minimal Latents” Approach to Natural Abstractions
johnswentworth
Dec 20, 2022, 1:22 AM
53
points
24
comments
12
min read
LW
link
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
Dec 19, 2022, 10:52 PM
150
points
30
comments
18
min read
LW
link
[Question]
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois
Dec 19, 2022, 10:42 PM
5
points
6
comments
1
min read
LW
link
AGI Timelines in Governance: Different Strategies for Different Timeframes
simeon_c
and
AmberDawn
Dec 19, 2022, 9:31 PM
65
points
28
comments
10
min read
LW
link
Towards Hodge-podge Alignment
Cleo Nardo
Dec 19, 2022, 8:12 PM
95
points
30
comments
9
min read
LW
link
Computational signatures of psychopathy
Cameron Berg
Dec 19, 2022, 5:01 PM
30
points
3
comments
20
min read
LW
link
Results from a survey on tool use and workflows in alignment research
jacquesthibs
,
Jan
,
janus
and
Logan Riggs
Dec 19, 2022, 3:19 PM
79
points
2
comments
19
min read
LW
link
Does ChatGPT’s performance warrant working on a tutor for children? [It’s time to take it to the lab.]
Bill Benzon
Dec 19, 2022, 3:12 PM
13
points
5
comments
4
min read
LW
link
(new-savanna.blogspot.com)
Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner’s Dilemma
Jim Buhler
19 Dec 2022 15:00 UTC
24
points
4
comments
5
min read
LW
link
Next Level Seinfeld
Zvi
19 Dec 2022 13:30 UTC
50
points
8
comments
1
min read
LW
link
(thezvi.wordpress.com)
CEA Disambiguation
jefftk
19 Dec 2022 13:20 UTC
25
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)
Remmelt
19 Dec 2022 12:02 UTC
−3
points
9
comments
31
min read
LW
link
Hacker-AI and Cyberwar 2.0+
Erland Wittkotter
19 Dec 2022 11:46 UTC
2
points
0
comments
15
min read
LW
link
Non-Technical Preparation for Hacker-AI and Cyberwar 2.0+
Erland Wittkotter
19 Dec 2022 11:42 UTC
2
points
0
comments
25
min read
LW
link
An Effective Grab Bag
stavros
19 Dec 2022 10:29 UTC
28
points
2
comments
7
min read
LW
link
Slick hyperfinite Ramsey theory proof
Alok Singh
19 Dec 2022 8:40 UTC
8
points
3
comments
1
min read
LW
link
(alok.github.io)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel