Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
and
Ethan Perez
Dec 20, 2022, 8:08 PM
100
points
34
comments
1
min read
LW
link
(www.anthropic.com)
Reflections: Bureaucratic Hell
Haris Rashid
Dec 20, 2022, 7:22 PM
−5
points
1
comment
1
min read
LW
link
(www.harisrab.com)
Proliferating Education
Haris Rashid
Dec 20, 2022, 7:22 PM
−1
points
2
comments
5
min read
LW
link
(www.harisrab.com)
AGI is here, but nobody wants it. Why should we even care?
MGow
Dec 20, 2022, 7:14 PM
−22
points
0
comments
17
min read
LW
link
Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development
Roman Leventov
Dec 20, 2022, 5:13 PM
33
points
3
comments
36
min read
LW
link
I believe some AI doomers are overconfident
FTPickle
Dec 20, 2022, 5:09 PM
8
points
15
comments
2
min read
LW
link
Note on algorithms with multiple trained components
Steven Byrnes
Dec 20, 2022, 5:08 PM
23
points
4
comments
2
min read
LW
link
Marvel Snap: Phase 2
Zvi
Dec 20, 2022, 2:50 PM
11
points
1
comment
13
min read
LW
link
(thezvi.wordpress.com)
(Extremely) Naive Gradient Hacking Doesn’t Work
ojorgensen
Dec 20, 2022, 2:35 PM
17
points
0
comments
6
min read
LW
link
An Open Agency Architecture for Safe Transformative AI
davidad
Dec 20, 2022, 1:04 PM
80
points
22
comments
4
min read
LW
link
Under-Appreciated Ways to Use Flashcards—Part I
Florence Hinder
Dec 20, 2022, 12:43 PM
22
points
5
comments
5
min read
LW
link
(thoughtsaver.ghost.io)
EA & LW Forums Weekly Summary (12th Dec − 18th Dec 22′)
Zoe Williams
Dec 20, 2022, 9:49 AM
10
points
0
comments
LW
link
[link, 2019] AI paradigm: interactive learning from unlabeled instructions
the gears to ascension
Dec 20, 2022, 6:45 AM
2
points
0
comments
2
min read
LW
link
(jgrizou.github.io)
[Fiction] Unspoken Stone
Gordon Seidoh Worley
Dec 20, 2022, 5:11 AM
19
points
0
comments
5
min read
LW
link
Notice when you stop reading right before you understand
just_browsing
Dec 20, 2022, 5:09 AM
61
points
6
comments
1
min read
LW
link
Take 12: RLHF’s use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
Dec 20, 2022, 5:01 AM
25
points
1
comment
3
min read
LW
link
More notes from raising a late-talking kid
Steven Byrnes
Dec 20, 2022, 2:13 AM
40
points
2
comments
6
min read
LW
link
The “Minimal Latents” Approach to Natural Abstractions
johnswentworth
Dec 20, 2022, 1:22 AM
53
points
24
comments
12
min read
LW
link
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
Dec 19, 2022, 10:52 PM
150
points
30
comments
18
min read
LW
link
[Question]
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois
Dec 19, 2022, 10:42 PM
5
points
6
comments
1
min read
LW
link
AGI Timelines in Governance: Different Strategies for Different Timeframes
simeon_c
and
AmberDawn
Dec 19, 2022, 9:31 PM
65
points
28
comments
10
min read
LW
link
Towards Hodge-podge Alignment
Cleo Nardo
Dec 19, 2022, 8:12 PM
95
points
30
comments
9
min read
LW
link
Computational signatures of psychopathy
Cameron Berg
Dec 19, 2022, 5:01 PM
30
points
3
comments
20
min read
LW
link
Results from a survey on tool use and workflows in alignment research
jacquesthibs
,
Jan
,
janus
and
Logan Riggs
Dec 19, 2022, 3:19 PM
79
points
2
comments
19
min read
LW
link
Does ChatGPT’s performance warrant working on a tutor for children? [It’s time to take it to the lab.]
Bill Benzon
Dec 19, 2022, 3:12 PM
13
points
5
comments
4
min read
LW
link
(new-savanna.blogspot.com)
Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner’s Dilemma
Jim Buhler
Dec 19, 2022, 3:00 PM
24
points
4
comments
5
min read
LW
link
Next Level Seinfeld
Zvi
Dec 19, 2022, 1:30 PM
50
points
8
comments
1
min read
LW
link
(thezvi.wordpress.com)
CEA Disambiguation
jefftk
Dec 19, 2022, 1:20 PM
25
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)
Remmelt
Dec 19, 2022, 12:02 PM
−3
points
9
comments
31
min read
LW
link
Hacker-AI and Cyberwar 2.0+
Erland Wittkotter
Dec 19, 2022, 11:46 AM
2
points
0
comments
15
min read
LW
link
Non-Technical Preparation for Hacker-AI and Cyberwar 2.0+
Erland Wittkotter
Dec 19, 2022, 11:42 AM
2
points
0
comments
25
min read
LW
link
An Effective Grab Bag
stavros
Dec 19, 2022, 10:29 AM
28
points
2
comments
7
min read
LW
link
Slick hyperfinite Ramsey theory proof
Alok Singh
Dec 19, 2022, 8:40 AM
8
points
3
comments
1
min read
LW
link
(alok.github.io)
The True Spirit of Solstice?
Raemon
Dec 19, 2022, 8:00 AM
69
points
31
comments
9
min read
LW
link
The Risk of Orbital Debris and One (Cheap) Way to Mitigate It
clans
Dec 19, 2022, 3:16 AM
13
points
1
comment
4
min read
LW
link
(locationtbd.home.blog)
Why I think that teaching philosophy is high impact
Eleni Angelou
Dec 19, 2022, 3:11 AM
5
points
0
comments
2
min read
LW
link
A template for doing annual reviews
peterslattery
Dec 19, 2022, 3:09 AM
2
points
0
comments
1
min read
LW
link
Event [Berkeley]: Alignment Collaborator Speed-Meeting
AlexMennen
and
Carson Jones
Dec 19, 2022, 2:24 AM
18
points
2
comments
1
min read
LW
link
An easier(?) end to the electoral college
ejacob
Dec 19, 2022, 2:09 AM
2
points
2
comments
2
min read
LW
link
How Death Feels
sisyphus
Dec 18, 2022, 11:47 PM
−7
points
9
comments
1
min read
LW
link
Why Are Women Hot?
Jacob Falkovich
Dec 18, 2022, 11:20 PM
17
points
19
comments
11
min read
LW
link
[Question]
Can we, in principle, know the measure of counterfactual quantum branches?
sisyphus
Dec 18, 2022, 10:07 PM
1
point
15
comments
1
min read
LW
link
Boston Solstice 2022 Retrospective
jefftk
Dec 18, 2022, 7:00 PM
19
points
3
comments
5
min read
LW
link
(www.jefftk.com)
Take 11: “Aligning language models” should be weirder.
Charlie Steiner
Dec 18, 2022, 2:14 PM
34
points
0
comments
2
min read
LW
link
Bad at Arithmetic, Promising at Math
cohenmacaulay
Dec 18, 2022, 5:40 AM
100
points
19
comments
20
min read
LW
link
1
review
Overconfidence bubbles
kaputmi
Dec 18, 2022, 2:07 AM
3
points
0
comments
2
min read
LW
link
Positive values seem more robust and lasting than prohibitions
TurnTrout
Dec 17, 2022, 9:43 PM
52
points
13
comments
2
min read
LW
link
What we owe the microbiome
weverka
Dec 17, 2022, 7:40 PM
2
points
0
comments
1
min read
LW
link
(forum.effectivealtruism.org)
Why write more: improve your epistemics, self-care, & 28 other reasons
KatWoods
Dec 17, 2022, 7:25 PM
24
points
1
comment
6
min read
LW
link
Looking for an alignment tutor
JanB
Dec 17, 2022, 7:08 PM
15
points
2
comments
1
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel