Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
3
[Question]
Has anyone increased their AGI timelines?
Darren McKee
Nov 6, 2022, 12:03 AM
39
points
12
comments
1
min read
LW
link
Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Vika
,
Vikrant Varma
,
Ramana Kumar
and
Rohin Shah
Nov 25, 2022, 2:36 PM
39
points
9
comments
6
min read
LW
link
(vkrakovna.wordpress.com)
A caveat to the Orthogonality Thesis
Wuschel Schulz
Nov 9, 2022, 3:06 PM
38
points
10
comments
2
min read
LW
link
Internal communication framework
rosehadshar
and
Nora_Ammann
Nov 15, 2022, 12:41 PM
38
points
13
comments
12
min read
LW
link
Choosing the right dish
Adam Zerner
Nov 19, 2022, 1:38 AM
38
points
7
comments
8
min read
LW
link
[Question]
Is there any discussion on avoiding being Dutch-booked or otherwise taken advantage of one’s bounded rationality by refusing to engage?
Shmi
Nov 7, 2022, 2:36 AM
38
points
29
comments
1
min read
LW
link
How do I start a programming career in the West?
Lao Mein
Nov 25, 2022, 6:37 AM
38
points
7
comments
2
min read
LW
link
Feeling Old: Leaving your 20s in the 2020s
squidious
Nov 22, 2022, 10:50 PM
37
points
3
comments
1
min read
LW
link
(opalsandbonobos.blogspot.com)
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas
Orpheus16
Nov 25, 2022, 8:47 PM
37
points
2
comments
9
min read
LW
link
Discussing how to align Transformative AI if it’s developed very soon
elifland
Nov 28, 2022, 4:17 PM
37
points
2
comments
28
min read
LW
link
If Professional Investors Missed This...
jefftk
Nov 16, 2022, 3:00 PM
37
points
18
comments
3
min read
LW
link
(www.jefftk.com)
Simulators, constraints, and goal agnosticism: porbynotes vol. 1
porby
Nov 23, 2022, 4:22 AM
37
points
2
comments
35
min read
LW
link
User-Controlled Algorithmic Feeds
jefftk
Nov 12, 2022, 3:20 PM
35
points
7
comments
2
min read
LW
link
(www.jefftk.com)
Some research ideas in forecasting
Jsevillamol
Nov 15, 2022, 7:47 PM
35
points
2
comments
LW
link
Housing and Transit Thoughts #1
Zvi
Nov 2, 2022, 12:10 PM
35
points
5
comments
16
min read
LW
link
(thezvi.wordpress.com)
[Hebbian Natural Abstractions] Introduction
Samuel Nellessen
and
Jan
Nov 21, 2022, 8:34 PM
34
points
3
comments
4
min read
LW
link
(www.snellessen.com)
Value Formation: An Overarching Model
Thane Ruthenis
Nov 15, 2022, 5:16 PM
34
points
20
comments
34
min read
LW
link
Solstice 2022 Roundup
dspeyer
Nov 12, 2022, 9:26 PM
34
points
12
comments
1
min read
LW
link
Ways to buy time
Orpheus16
,
OliviaJ
and
Thomas Larsen
Nov 12, 2022, 7:31 PM
34
points
23
comments
12
min read
LW
link
Weekly Roundup #5
Zvi
Nov 11, 2022, 4:20 PM
33
points
0
comments
6
min read
LW
link
(thezvi.wordpress.com)
Thinking About Mastodon
jefftk
Nov 7, 2022, 7:40 PM
33
points
17
comments
1
min read
LW
link
(www.jefftk.com)
People care about each other even though they have imperfect motivational pointers?
TurnTrout
Nov 8, 2022, 6:15 PM
33
points
25
comments
7
min read
LW
link
Covid 11/17/22: Slow Recovery
Zvi
Nov 17, 2022, 2:50 PM
33
points
3
comments
4
min read
LW
link
(thezvi.wordpress.com)
Auditing games for high-level interpretability
Paul Colognese
Nov 1, 2022, 10:44 AM
33
points
1
comment
7
min read
LW
link
Make the Drought Evaporate!
AnthonyRepetto
Nov 19, 2022, 11:41 PM
32
points
25
comments
3
min read
LW
link
Charging for the Dharma
jchan
Nov 11, 2022, 2:02 PM
32
points
18
comments
5
min read
LW
link
Why bet Kelly?
AlexMennen
Nov 15, 2022, 6:12 PM
32
points
14
comments
5
min read
LW
link
Review: LOVE in a simbox
PeterMcCluskey
Nov 27, 2022, 5:41 PM
32
points
4
comments
9
min read
LW
link
(bayesianinvestor.com)
When should we be surprised that an invention took “so long”?
jasoncrawford
Nov 16, 2022, 8:04 PM
32
points
11
comments
4
min read
LW
link
(rootsofprogress.org)
Covid 11/10/22: Into the Background
Zvi
Nov 10, 2022, 1:40 PM
31
points
5
comments
4
min read
LW
link
(thezvi.wordpress.com)
Adversarial Policies Beat Professional-Level Go AIs
sanxiyn
Nov 3, 2022, 1:27 PM
31
points
35
comments
1
min read
LW
link
(goattack.alignmentfund.org)
Unpacking “Shard Theory” as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
Nov 16, 2022, 1:54 PM
31
points
9
comments
2
min read
LW
link
Gliders in Language Models
Alexandre Variengien
Nov 25, 2022, 12:38 AM
30
points
11
comments
10
min read
LW
link
A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)
Neel Nanda
Nov 7, 2022, 10:39 PM
30
points
15
comments
3
min read
LW
link
(youtu.be)
What videos should Rational Animations make?
Writer
Nov 26, 2022, 8:28 PM
30
points
24
comments
LW
link
The Mirror Chamber: A short story exploring the anthropic measure function and why it can matter
mako yass
Nov 3, 2022, 6:47 AM
30
points
13
comments
10
min read
LW
link
ML Safety Scholars Summer 2022 Retrospective
TW123
Nov 1, 2022, 3:09 AM
29
points
0
comments
LW
link
You won’t solve alignment without agent foundations
Mikhail Samin
Nov 6, 2022, 8:07 AM
29
points
3
comments
8
min read
LW
link
Response
Jarred Filmer
Nov 6, 2022, 1:03 AM
29
points
2
comments
12
min read
LW
link
Good Futures Initiative: Winter Project Internship
Aris
Nov 27, 2022, 11:41 PM
28
points
4
comments
4
min read
LW
link
The economy as an analogy for advanced AI systems
rosehadshar
and
particlemania
Nov 15, 2022, 11:16 AM
28
points
0
comments
5
min read
LW
link
Mechanistic Interpretability as Reverse Engineering (follow-up to “cars and elephants”)
David Scott Krueger (formerly: capybaralet)
Nov 3, 2022, 11:19 PM
28
points
3
comments
1
min read
LW
link
A short critique of Vanessa Kosoy’s PreDCA
Martín Soto
Nov 13, 2022, 4:00 PM
28
points
8
comments
4
min read
LW
link
Semi-conductor/AI Stock Discussion.
sapphire
Nov 25, 2022, 11:35 PM
28
points
25
comments
1
min read
LW
link
Estimating the probability that FTX Future Fund grant money gets clawed back
spencerg
Nov 14, 2022, 3:33 AM
28
points
6
comments
LW
link
Toy Models and Tegum Products
Adam Jermyn
Nov 4, 2022, 6:51 PM
28
points
7
comments
5
min read
LW
link
LLMs may capture key components of human agency
catubc
Nov 17, 2022, 8:14 PM
27
points
0
comments
4
min read
LW
link
Why I’m Working On Model Agnostic Interpretability
Jessica Rumbelow
Nov 11, 2022, 9:24 AM
27
points
9
comments
2
min read
LW
link
The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)
Jessica Rumbelow
Nov 17, 2022, 11:06 AM
27
points
2
comments
2
min read
LW
link
Inverse scaling can become U-shaped
Edouard Harris
Nov 8, 2022, 7:04 PM
27
points
15
comments
1
min read
LW
link
(arxiv.org)
Back to first
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel