Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
Update to Mysteries of mode collapse: text-davinci-002 not RLHF
janus
Nov 19, 2022, 11:51 PM
71
points
8
comments
2
min read
LW
link
Make the Drought Evaporate!
AnthonyRepetto
Nov 19, 2022, 11:41 PM
32
points
25
comments
3
min read
LW
link
Elastic Productivity Tools
Simon Berens
Nov 19, 2022, 9:59 PM
76
points
8
comments
2
min read
LW
link
(simonberens.me)
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
,
Quintin Pope
and
peligrietzer
Nov 19, 2022, 9:04 PM
45
points
0
comments
3
min read
LW
link
By Default, GPTs Think In Plain Sight
Fabien Roger
Nov 19, 2022, 7:15 PM
88
points
36
comments
9
min read
LW
link
Review: Bayesian Statistics the Fun Way by Will Kurt
matto
Nov 19, 2022, 6:52 PM
4
points
2
comments
2
min read
LW
link
[Question]
How does acausal trade work in a deterministic multiverse?
sisyphus
Nov 19, 2022, 1:50 AM
2
points
13
comments
1
min read
LW
link
Choosing the right dish
Adam Zerner
Nov 19, 2022, 1:38 AM
38
points
7
comments
8
min read
LW
link
Reflective Consequentialism
Adam Zerner
Nov 18, 2022, 11:56 PM
21
points
14
comments
4
min read
LW
link
Value Created vs. Value Extracted
Sable
Nov 18, 2022, 9:34 PM
8
points
6
comments
6
min read
LW
link
(affablyevil.substack.com)
The Disastrously Confident And Inaccurate AI
Sharat Jacob Jacob
Nov 18, 2022, 7:06 PM
13
points
0
comments
13
min read
LW
link
How AI Fails Us: A non-technical view of the Alignment Problem
testingthewaters
Nov 18, 2022, 7:02 PM
7
points
1
comment
2
min read
LW
link
(ethics.harvard.edu)
[Question]
Is there any policy for a fair treatment of AIs whose friendliness is in doubt?
nahoj
Nov 18, 2022, 7:01 PM
15
points
10
comments
1
min read
LW
link
Distillation of “How Likely Is Deceptive Alignment?”
NickGabs
Nov 18, 2022, 4:31 PM
24
points
4
comments
10
min read
LW
link
Contra Chords
jefftk
Nov 18, 2022, 4:20 PM
12
points
1
comment
7
min read
LW
link
(www.jefftk.com)
[Question]
Updates on scaling laws for foundation models from ′ Transcending Scaling Laws with 0.1% Extra Compute’
Nick_Greig
Nov 18, 2022, 12:46 PM
15
points
2
comments
1
min read
LW
link
Halifax, NS – Monthly Rationalist, EA, and ACX Meetup
Ideopunk
Nov 18, 2022, 11:45 AM
10
points
0
comments
1
min read
LW
link
Introducing The Logical Foundation, A Plan to End Poverty With Guaranteed Income
Michael Simm
Nov 18, 2022, 8:13 AM
9
points
23
comments
LW
link
My Deontology Says Narrow-Mindedness is Always Wrong
LVSN
Nov 18, 2022, 6:11 AM
6
points
2
comments
1
min read
LW
link
AI Ethics != Ai Safety
Dentin
Nov 18, 2022, 3:02 AM
2
points
0
comments
1
min read
LW
link
Don’t design agents which exploit adversarial inputs
TurnTrout
and
Garrett Baker
Nov 18, 2022, 1:48 AM
72
points
64
comments
12
min read
LW
link
Engineering Monosemanticity in Toy Models
Adam Jermyn
,
evhub
and
Nicholas Schiefer
Nov 18, 2022, 1:43 AM
75
points
7
comments
3
min read
LW
link
(arxiv.org)
AGIs may value intrinsic rewards more than extrinsic ones
catubc
Nov 17, 2022, 9:49 PM
8
points
6
comments
4
min read
LW
link
LLMs may capture key components of human agency
catubc
Nov 17, 2022, 8:14 PM
27
points
0
comments
4
min read
LW
link
Mastodon Replies as Comments
jefftk
Nov 17, 2022, 8:10 PM
20
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Announcing the Progress Forum
jasoncrawford
Nov 17, 2022, 7:26 PM
83
points
9
comments
1
min read
LW
link
[Question]
What kind of bias is this?
Daniel Samuel
Nov 17, 2022, 6:44 PM
3
points
2
comments
1
min read
LW
link
AI Forecasting Research Ideas
Jsevillamol
Nov 17, 2022, 5:37 PM
21
points
2
comments
LW
link
Results from the interpretability hackathon
Esben Kran
and
Neel Nanda
Nov 17, 2022, 2:51 PM
81
points
0
comments
6
min read
LW
link
(alignmentjam.com)
Covid 11/17/22: Slow Recovery
Zvi
Nov 17, 2022, 2:50 PM
33
points
3
comments
4
min read
LW
link
(thezvi.wordpress.com)
Sadly, FTX
Zvi
Nov 17, 2022, 2:30 PM
133
points
18
comments
47
min read
LW
link
(thezvi.wordpress.com)
Deontology and virtue ethics as “effective theories” of consequentialist ethics
Jan_Kulveit
Nov 17, 2022, 2:11 PM
68
points
9
comments
LW
link
1
review
The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)
Jessica Rumbelow
Nov 17, 2022, 11:06 AM
27
points
2
comments
2
min read
LW
link
[Question]
[Personal Question] Can anyone help me navigate this potentially painful interpersonal dynamic rationally?
SlainLadyMondegreen
Nov 17, 2022, 8:53 AM
9
points
3
comments
4
min read
LW
link
Massive Scaling Should be Frowned Upon
harsimony
Nov 17, 2022, 8:43 AM
4
points
6
comments
5
min read
LW
link
[Question]
Why are profitable companies laying off staff?
Yair Halberstadt
Nov 17, 2022, 6:19 AM
15
points
10
comments
1
min read
LW
link
Discussion: Was SBF a naive utilitarian, or a sociopath?
Nicholas / Heather Kross
Nov 17, 2022, 2:52 AM
0
points
4
comments
LW
link
Kelsey Piper’s recent interview of SBF
agucova
Nov 16, 2022, 8:30 PM
51
points
29
comments
LW
link
The Echo Principle
Jonathan Moregård
Nov 16, 2022, 8:09 PM
4
points
0
comments
3
min read
LW
link
(honestliving.substack.com)
[Question]
Is there some reason LLMs haven’t seen broader use?
tailcalled
Nov 16, 2022, 8:04 PM
25
points
27
comments
1
min read
LW
link
When should we be surprised that an invention took “so long”?
jasoncrawford
Nov 16, 2022, 8:04 PM
32
points
11
comments
4
min read
LW
link
(rootsofprogress.org)
Questions about Value Lock-in, Paternalism, and Empowerment
Sam F. Brown
Nov 16, 2022, 3:33 PM
13
points
2
comments
12
min read
LW
link
(sambrown.eu)
If Professional Investors Missed This...
jefftk
Nov 16, 2022, 3:00 PM
37
points
18
comments
3
min read
LW
link
(www.jefftk.com)
Disagreement with bio anchors that lead to shorter timelines
Marius Hobbhahn
16 Nov 2022 14:40 UTC
75
points
17
comments
7
min read
LW
link
1
review
Current themes in mechanistic interpretability research
Lee Sharkey
,
Sid Black
and
beren
16 Nov 2022 14:14 UTC
89
points
2
comments
12
min read
LW
link
Unpacking “Shard Theory” as Hunch, Question, Theory, and Insight
Jacy Reese Anthis
16 Nov 2022 13:54 UTC
31
points
9
comments
2
min read
LW
link
Miracles and why not to believe them
mruwnik
16 Nov 2022 12:07 UTC
4
points
0
comments
2
min read
LW
link
[Question]
How do people do remote research collaborations effectively?
Krieger
16 Nov 2022 11:51 UTC
8
points
0
comments
1
min read
LW
link
Method of statements: an alternative to taboo
Q Home
16 Nov 2022 10:57 UTC
7
points
0
comments
41
min read
LW
link
The two conceptions of Active Inference: an intelligence architecture and a theory of agency
Roman Leventov
16 Nov 2022 9:30 UTC
17
points
0
comments
4
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel