Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Introducing Shrubgrazer
jefftk
Dec 16, 2022, 2:50 PM
22
points
0
comments
2
min read
LW
link
(www.jefftk.com)
Paper: Transformers learn in-context by gradient descent
LawrenceC
Dec 16, 2022, 11:10 AM
28
points
11
comments
2
min read
LW
link
(arxiv.org)
Will Machines Ever Rule the World? MLAISU W50
Esben Kran
Dec 16, 2022, 11:03 AM
12
points
7
comments
4
min read
LW
link
(newsletter.apartresearch.com)
AI overhangs depend on whether algorithms, compute and data are substitutes or complements
NathanBarnard
Dec 16, 2022, 2:23 AM
4
points
0
comments
3
min read
LW
link
AI Safety Movement Builders should help the community to optimise three factors: contributors, contributions and coordination
peterslattery
Dec 15, 2022, 10:50 PM
4
points
0
comments
6
min read
LW
link
Masking to Avoid Missing Things
jefftk
Dec 15, 2022, 9:00 PM
17
points
2
comments
1
min read
LW
link
(www.jefftk.com)
Consider working more hours and taking more stimulants
Arjun Panickssery
Dec 15, 2022, 8:38 PM
33
points
11
comments
LW
link
We’ve stepped over the threshold into the Fourth Arena, but don’t recognize it
Bill Benzon
Dec 15, 2022, 8:22 PM
2
points
0
comments
7
min read
LW
link
[Question]
How is ARC planning to use ELK?
jacquesthibs
Dec 15, 2022, 8:11 PM
24
points
5
comments
1
min read
LW
link
How “Discovering Latent Knowledge in Language Models Without Supervision” Fits Into a Broader Alignment Scheme
Collin
Dec 15, 2022, 6:22 PM
244
points
39
comments
16
min read
LW
link
1
review
High-level hopes for AI alignment
HoldenKarnofsky
Dec 15, 2022, 6:00 PM
58
points
3
comments
19
min read
LW
link
(www.cold-takes.com)
Two Dogmas of LessWrong
omnizoid
Dec 15, 2022, 5:56 PM
−7
points
155
comments
69
min read
LW
link
Covid 12/15/22: China’s Wave Begins
Zvi
Dec 15, 2022, 4:20 PM
32
points
7
comments
10
min read
LW
link
(thezvi.wordpress.com)
The next decades might be wild
Marius Hobbhahn
Dec 15, 2022, 4:10 PM
175
points
42
comments
41
min read
LW
link
1
review
Basic building blocks of dependent type theory
Thomas Kehrenberg
Dec 15, 2022, 2:54 PM
49
points
9
comments
13
min read
LW
link
AI Neorealism: a threat model & success criterion for existential safety
davidad
Dec 15, 2022, 1:42 PM
67
points
1
comment
3
min read
LW
link
Who should write the definitive post on Ziz?
Nicholas / Heather Kross
Dec 15, 2022, 6:37 AM
4
points
45
comments
3
min read
LW
link
[Question]
Is Paul Christiano still as optimistic about Approval-Directed Agents as he was in 2018?
Chris_Leong
Dec 14, 2022, 11:28 PM
8
points
0
comments
1
min read
LW
link
«Boundaries», Part 3b: Alignment problems in terms of boundaries
Andrew_Critch
Dec 14, 2022, 10:34 PM
72
points
7
comments
13
min read
LW
link
Aligning alignment with performance
Marv K
Dec 14, 2022, 10:19 PM
2
points
0
comments
2
min read
LW
link
Contrary to List of Lethality’s point 22, alignment’s door number 2
False Name
Dec 14, 2022, 10:01 PM
−2
points
5
comments
22
min read
LW
link
Kolmogorov Complexity and Simulation Hypothesis
False Name
Dec 14, 2022, 10:01 PM
−3
points
0
comments
7
min read
LW
link
[Question]
Stanley Meyer’s water fuel cell
mikbp
Dec 14, 2022, 9:19 PM
2
points
6
comments
1
min read
LW
link
[Question]
Is the AI timeline too short to have children?
Yoreth
Dec 14, 2022, 6:32 PM
38
points
20
comments
1
min read
LW
link
Predicting GPU performance
Marius Hobbhahn
and
Tamay
Dec 14, 2022, 4:27 PM
60
points
26
comments
1
min read
LW
link
(epochai.org)
[Incomplete] What is Computation Anyway?
DragonGod
Dec 14, 2022, 4:17 PM
16
points
1
comment
13
min read
LW
link
(arxiv.org)
Chair Hanging Peg
jefftk
Dec 14, 2022, 3:30 PM
11
points
0
comments
1
min read
LW
link
(www.jefftk.com)
My AGI safety research—2022 review, ’23 plans
Steven Byrnes
Dec 14, 2022, 3:15 PM
51
points
10
comments
7
min read
LW
link
Extracting and Evaluating Causal Direction in LLMs’ Activations
Fabien Roger
and
simeon_c
Dec 14, 2022, 2:33 PM
29
points
5
comments
11
min read
LW
link
Key Mostly Outward-Facing Facts From the Story of VaccinateCA
Zvi
Dec 14, 2022, 1:30 PM
61
points
2
comments
23
min read
LW
link
(thezvi.wordpress.com)
Discovering Latent Knowledge in Language Models Without Supervision
Xodarap
Dec 14, 2022, 12:32 PM
45
points
1
comment
1
min read
LW
link
(arxiv.org)
[Question]
COVID China Personal Advice (No mRNA vax, possible hospital overload, bug-chasing edition)
Lao Mein
Dec 14, 2022, 10:31 AM
20
points
11
comments
1
min read
LW
link
Beyond a better world
Davidmanheim
Dec 14, 2022, 10:18 AM
14
points
7
comments
4
min read
LW
link
(progressforum.org)
Proof as mere strong evidence
adamShimi
Dec 14, 2022, 8:56 AM
28
points
16
comments
2
min read
LW
link
(epistemologicalvigilance.substack.com)
Trying to disambiguate different questions about whether RLHF is “good”
Buck
Dec 14, 2022, 4:03 AM
108
points
47
comments
7
min read
LW
link
1
review
[Question]
How can one literally buy time (from x-risk) with money?
Alex_Altair
Dec 13, 2022, 7:24 PM
24
points
3
comments
1
min read
LW
link
[Question]
Best introductory overviews of AGI safety?
JakubK
Dec 13, 2022, 7:01 PM
21
points
9
comments
2
min read
LW
link
(forum.effectivealtruism.org)
Applications open for AGI Safety Fundamentals: Alignment Course
Richard_Ngo
Dec 13, 2022, 6:31 PM
49
points
0
comments
2
min read
LW
link
What Does It Mean to Align AI With Human Values?
Algon
Dec 13, 2022, 4:56 PM
8
points
3
comments
1
min read
LW
link
(www.quantamagazine.org)
It Takes Two Paracetamol?
Eli_
Dec 13, 2022, 4:29 PM
33
points
10
comments
2
min read
LW
link
[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey
,
Dan Braun
and
beren
Dec 13, 2022, 3:41 PM
150
points
23
comments
22
min read
LW
link
2
reviews
[Question]
Is the ChatGPT-simulated Linux virtual machine real?
Kenoubi
Dec 13, 2022, 3:41 PM
18
points
7
comments
1
min read
LW
link
Existential AI Safety is NOT separate from near-term applications
scasper
Dec 13, 2022, 2:47 PM
37
points
17
comments
3
min read
LW
link
What is the correlation between upvoting and benefit to readers of LW?
banev
Dec 13, 2022, 2:26 PM
7
points
15
comments
1
min read
LW
link
Limits of Superintelligence
Aleksei Petrenko
Dec 13, 2022, 12:19 PM
1
point
5
comments
1
min read
LW
link
Bay 2022 Solstice
Raemon
Dec 13, 2022, 8:58 AM
17
points
0
comments
1
min read
LW
link
Last day to nominate things for the Review. Also, 2019 books still exist.
Raemon
Dec 13, 2022, 8:53 AM
15
points
0
comments
1
min read
LW
link
AI alignment is distinct from its near-term applications
paulfchristiano
Dec 13, 2022, 7:10 AM
255
points
21
comments
2
min read
LW
link
(ai-alignment.com)
Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Charlie Steiner
Dec 13, 2022, 7:04 AM
37
points
3
comments
2
min read
LW
link
[Question]
Are lawsuits against AGI companies extending AGI timelines?
SlowingAGI
Dec 13, 2022, 6:00 AM
1
point
1
comment
1
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel