Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
2
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility
Orpheus16
and
OliviaJ
Nov 22, 2022, 10:19 PM
73
points
20
comments
4
min read
LW
link
ACX Zurich November Meetup
MB
Nov 22, 2022, 9:41 PM
1
point
0
comments
1
min read
LW
link
Human-level Full-Press Diplomacy (some bare facts).
Cleo Nardo
Nov 22, 2022, 8:59 PM
50
points
7
comments
3
min read
LW
link
[Question]
How does late-2022 COVID transmissibility drop over time?
Daniel Dewey
Nov 22, 2022, 7:54 PM
8
points
2
comments
1
min read
LW
link
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
and
benedelman
Nov 22, 2022, 6:57 PM
134
points
97
comments
24
min read
LW
link
Progress links and tweets, 2022-11-22
jasoncrawford
Nov 22, 2022, 5:39 PM
17
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
Tyranny of the Epistemic Majority
Scott Garrabrant
Nov 22, 2022, 5:19 PM
192
points
13
comments
9
min read
LW
link
1
review
A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2
Neel Nanda
Nov 22, 2022, 5:12 PM
20
points
0
comments
1
min read
LW
link
(www.youtube.com)
Simple Improvement to College Football Overtime Rules
Zvi
Nov 22, 2022, 5:00 PM
10
points
0
comments
1
min read
LW
link
(thezvi.wordpress.com)
Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)
Jacy Reese Anthis
Nov 22, 2022, 4:50 PM
93
points
64
comments
1
min read
LW
link
(www.science.org)
Austin LW meetup notes: The FTX Affair
jchan
Nov 22, 2022, 2:01 PM
20
points
3
comments
16
min read
LW
link
Motivated Cognition and the Multiverse of Truth
Q Home
Nov 22, 2022, 12:51 PM
8
points
16
comments
24
min read
LW
link
LessWrong readers are invited to apply to the Lurkshop
Jonas V
and
GradientDissenter
Nov 22, 2022, 9:19 AM
101
points
41
comments
3
min read
LW
link
Gaoxing Guy
Alok Singh
Nov 22, 2022, 1:50 AM
3
points
1
comment
1
min read
LW
link
(alok.github.io)
Miscellaneous First-Pass Alignment Thoughts
NickGabs
Nov 21, 2022, 9:23 PM
12
points
4
comments
10
min read
LW
link
[Hebbian Natural Abstractions] Introduction
Samuel Nellessen
and
Jan
Nov 21, 2022, 8:34 PM
34
points
3
comments
4
min read
LW
link
(www.snellessen.com)
Utilitarianism Meets Egalitarianism
Scott Garrabrant
Nov 21, 2022, 7:00 PM
121
points
16
comments
6
min read
LW
link
1
review
Interview with Matt Freeman
Evenflair
Nov 21, 2022, 6:17 PM
15
points
0
comments
1
min read
LW
link
(overcast.fm)
Here’s the exit.
Valentine
Nov 21, 2022, 6:07 PM
115
points
180
comments
10
min read
LW
link
5
reviews
Benefits/Risks of Scott Aaronson’s Orthodox/Reform Framing for AI Alignment
Jeremyy
Nov 21, 2022, 5:54 PM
2
points
1
comment
LW
link
[ASoT] Reflectivity in Narrow AI
Ulisse Mini
Nov 21, 2022, 12:51 AM
6
points
1
comment
1
min read
LW
link
Scott Aaronson on “Reform AI Alignment”
Shmi
Nov 20, 2022, 10:20 PM
39
points
17
comments
1
min read
LW
link
(scottaaronson.blog)
On Morality, Ethics, and all that Jazz
Delen Heisman
Nov 20, 2022, 8:00 PM
4
points
4
comments
2
min read
LW
link
(delen.substack.com)
Limits to the Controllability of AGI
Roman_Yampolskiy
,
Remmelt Ellen
and
Karl von Wendt
Nov 20, 2022, 7:18 PM
10
points
2
comments
9
min read
LW
link
Career Scouting: Dentistry
koratkar
Nov 20, 2022, 3:55 PM
69
points
5
comments
5
min read
LW
link
(careerscouting.substack.com)
Decision Theory but also Ghosts
eva_
Nov 20, 2022, 1:24 PM
17
points
21
comments
10
min read
LW
link
ARC paper: Formalizing the presumption of independence
Erik Jenner
Nov 20, 2022, 1:22 AM
97
points
2
comments
2
min read
LW
link
(arxiv.org)
Update to Mysteries of mode collapse: text-davinci-002 not RLHF
janus
Nov 19, 2022, 11:51 PM
71
points
8
comments
2
min read
LW
link
Make the Drought Evaporate!
AnthonyRepetto
Nov 19, 2022, 11:41 PM
32
points
25
comments
3
min read
LW
link
Elastic Productivity Tools
Simon Berens
Nov 19, 2022, 9:59 PM
76
points
8
comments
2
min read
LW
link
(simonberens.me)
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
,
Quintin Pope
and
peligrietzer
Nov 19, 2022, 9:04 PM
45
points
0
comments
3
min read
LW
link
By Default, GPTs Think In Plain Sight
Fabien Roger
Nov 19, 2022, 7:15 PM
88
points
36
comments
9
min read
LW
link
Review: Bayesian Statistics the Fun Way by Will Kurt
matto
Nov 19, 2022, 6:52 PM
4
points
2
comments
2
min read
LW
link
[Question]
How does acausal trade work in a deterministic multiverse?
sisyphus
Nov 19, 2022, 1:50 AM
2
points
13
comments
1
min read
LW
link
Choosing the right dish
Adam Zerner
Nov 19, 2022, 1:38 AM
38
points
7
comments
8
min read
LW
link
Reflective Consequentialism
Adam Zerner
Nov 18, 2022, 11:56 PM
21
points
14
comments
4
min read
LW
link
Value Created vs. Value Extracted
Sable
Nov 18, 2022, 9:34 PM
8
points
6
comments
6
min read
LW
link
(affablyevil.substack.com)
The Disastrously Confident And Inaccurate AI
Sharat Jacob Jacob
Nov 18, 2022, 7:06 PM
13
points
0
comments
13
min read
LW
link
How AI Fails Us: A non-technical view of the Alignment Problem
testingthewaters
Nov 18, 2022, 7:02 PM
7
points
1
comment
2
min read
LW
link
(ethics.harvard.edu)
[Question]
Is there any policy for a fair treatment of AIs whose friendliness is in doubt?
nahoj
Nov 18, 2022, 7:01 PM
15
points
10
comments
1
min read
LW
link
Distillation of “How Likely Is Deceptive Alignment?”
NickGabs
Nov 18, 2022, 4:31 PM
24
points
4
comments
10
min read
LW
link
Contra Chords
jefftk
Nov 18, 2022, 4:20 PM
12
points
1
comment
7
min read
LW
link
(www.jefftk.com)
[Question]
Updates on scaling laws for foundation models from ′ Transcending Scaling Laws with 0.1% Extra Compute’
Nick_Greig
Nov 18, 2022, 12:46 PM
15
points
2
comments
1
min read
LW
link
Halifax, NS – Monthly Rationalist, EA, and ACX Meetup
Ideopunk
Nov 18, 2022, 11:45 AM
10
points
0
comments
1
min read
LW
link
Introducing The Logical Foundation, A Plan to End Poverty With Guaranteed Income
Michael Simm
Nov 18, 2022, 8:13 AM
9
points
23
comments
LW
link
My Deontology Says Narrow-Mindedness is Always Wrong
LVSN
Nov 18, 2022, 6:11 AM
6
points
2
comments
1
min read
LW
link
AI Ethics != Ai Safety
Dentin
Nov 18, 2022, 3:02 AM
2
points
0
comments
1
min read
LW
link
Don’t design agents which exploit adversarial inputs
TurnTrout
and
Garrett Baker
Nov 18, 2022, 1:48 AM
72
points
64
comments
12
min read
LW
link
Engineering Monosemanticity in Toy Models
Adam Jermyn
,
evhub
and
Nicholas Schiefer
Nov 18, 2022, 1:43 AM
75
points
7
comments
3
min read
LW
link
(arxiv.org)
AGIs may value intrinsic rewards more than extrinsic ones
catubc
Nov 17, 2022, 9:49 PM
8
points
6
comments
4
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel