Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Academic Papers
Tag
Last edit:
9 Jul 2020 11:36 UTC
by
Kaj_Sotala
Posts either linking to, or summarizing, formal papers published elsewhere.
Relevant
New
Old
Some AI research areas and their relevance to existential safety
Andrew_Critch
19 Nov 2020 3:18 UTC
204
points
37
comments
50
min read
LW
link
2
reviews
Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley
5 Jan 2024 8:46 UTC
37
points
4
comments
2
min read
LW
link
How to Control an LLM’s Behavior (why my P(DOOM) went down)
RogerDearnaley
28 Nov 2023 19:56 UTC
64
points
30
comments
11
min read
LW
link
Thirty-three randomly selected bioethics papers
Rob Bensinger
22 Mar 2021 21:38 UTC
115
points
46
comments
50
min read
LW
link
My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)
Robert_AIZI
27 Dec 2022 17:27 UTC
50
points
0
comments
4
min read
LW
link
(aizi.substack.com)
SSC Journal Club: AI Timelines
Scott Alexander
8 Jun 2017 19:00 UTC
15
points
16
comments
8
min read
LW
link
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
4 Jun 2024 15:50 UTC
120
points
14
comments
13
min read
LW
link
Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L
,
bilalchughtai
,
jan betley
,
kaivu
,
Jérémy Scheurer
,
Mikita Balesni
,
AlexMeinke
,
Owain_Evans
and
Marius Hobbhahn
8 Jul 2024 22:24 UTC
103
points
28
comments
5
min read
LW
link
New paper: Long-Term Trajectories of Human Civilization
Kaj_Sotala
12 Aug 2018 9:10 UTC
33
points
1
comment
2
min read
LW
link
(kajsotala.fi)
Study on what makes people approve or condemn mind upload technology; references LW
Kaj_Sotala
10 Jul 2018 17:14 UTC
22
points
0
comments
2
min read
LW
link
(www.nature.com)
AGI Safety Literature Review (Everitt, Lea & Hutter 2018)
Kaj_Sotala
4 May 2018 8:56 UTC
14
points
1
comment
1
min read
LW
link
(arxiv.org)
Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”
Kaj_Sotala
12 Feb 2018 12:30 UTC
45
points
4
comments
6
min read
LW
link
(kajsotala.fi)
Papers for 2017
Kaj_Sotala
4 Jan 2018 13:30 UTC
12
points
2
comments
2
min read
LW
link
(kajsotala.fi)
Paper: Superintelligence as a Cause or Cure for Risks of Astronomical Suffering
Kaj_Sotala
3 Jan 2018 13:57 UTC
13
points
0
comments
1
min read
LW
link
(www.informatica.si)
Social Choice Ethics in Artificial Intelligence (paper challenging CEV-like approaches to choosing an AI’s values)
Kaj_Sotala
3 Oct 2017 17:39 UTC
3
points
0
comments
1
min read
LW
link
(papers.ssrn.com)
[link] Why Self-Control Seems (but may not be) Limited
Kaj_Sotala
20 Jan 2014 16:55 UTC
55
points
10
comments
3
min read
LW
link
Kurzban et al. on opportunity cost models of mental fatigue and resource-based models of willpower
Kaj_Sotala
6 Dec 2013 9:54 UTC
34
points
18
comments
5
min read
LW
link
Fallacies as weak Bayesian evidence
Kaj_Sotala
18 Mar 2012 3:53 UTC
88
points
42
comments
10
min read
LW
link
I Was Not Almost Wrong But I Was Almost Right: Close-Call Counterfactuals and Bias
Kaj_Sotala
8 Mar 2012 5:39 UTC
86
points
40
comments
9
min read
LW
link
[Preprint for commenting] Digital Immortality: Theory and Protocol for Indirect Mind Uploading
avturchin
27 Mar 2018 11:49 UTC
8
points
5
comments
1
min read
LW
link
IJMC Mind Uploading Special Issue published
Kaj_Sotala
22 Jun 2012 11:58 UTC
19
points
12
comments
1
min read
LW
link
Bad news for uploading
PhilGoetz
13 Dec 2012 23:32 UTC
19
points
6
comments
1
min read
LW
link
“Personal Identity and Uploading”, by Mark Walker
gwern
7 Jan 2012 19:55 UTC
7
points
19
comments
16
min read
LW
link
“Ray Kurzweil and Uploading: Just Say No!”, Nick Agar
gwern
2 Dec 2011 21:42 UTC
6
points
79
comments
6
min read
LW
link
Publication of “Anthropic Decision Theory”
Stuart_Armstrong
20 Sep 2017 15:41 UTC
12
points
9
comments
1
min read
LW
link
Computerphile discusses MIRI’s “Logical Induction” paper
Parth Athley
4 Oct 2018 16:00 UTC
43
points
2
comments
1
min read
LW
link
(www.youtube.com)
New paper from MIRI: “Toward idealized decision theory”
So8res
16 Dec 2014 22:27 UTC
41
points
22
comments
3
min read
LW
link
Notes/blog posts on two recent MIRI papers
Quinn
14 Jul 2013 23:11 UTC
35
points
3
comments
1
min read
LW
link
[LINK] International variation in IQ – the role of parasites
David_Gerard
14 May 2012 12:08 UTC
10
points
49
comments
1
min read
LW
link
IQ Scores Fail to Predict Academic Performance in Children With Autism
InquilineKea
18 Nov 2010 3:34 UTC
9
points
9
comments
2
min read
LW
link
[LINK] Neuroscientists Find That Status within Groups Can Affect IQ
cafesofie
23 Jan 2012 19:52 UTC
6
points
5
comments
1
min read
LW
link
New report: Intelligence Explosion Microeconomics
Eliezer Yudkowsky
29 Apr 2013 23:14 UTC
72
points
246
comments
3
min read
LW
link
The Chromatic Number of the Plane is at Least 5 - Aubrey de Grey
Scott Garrabrant
11 Apr 2018 18:19 UTC
61
points
5
comments
1
min read
LW
link
(arxiv.org)
[Question]
Why is pseudo-alignment “worse” than other ways ML can fail to generalize?
nostalgebraist
18 Jul 2020 22:54 UTC
45
points
9
comments
2
min read
LW
link
Stanford Encyclopedia of Philosophy on AI ethics and superintelligence
Kaj_Sotala
2 May 2020 7:35 UTC
43
points
19
comments
7
min read
LW
link
(plato.stanford.edu)
Multiverse-wide Cooperation via Correlated Decision Making
Kaj_Sotala
20 Aug 2017 12:01 UTC
5
points
2
comments
1
min read
LW
link
(foundational-research.org)
A technical note on bilinear layers for interpretability
Lee Sharkey
8 May 2023 6:06 UTC
58
points
0
comments
1
min read
LW
link
(arxiv.org)
Papers, Please #1: Various Papers on Employment, Wages and Productivity
Zvi
22 May 2023 12:00 UTC
42
points
2
comments
8
min read
LW
link
(thezvi.wordpress.com)
Aumann Agreement by Combat
roryokane
5 Apr 2019 5:15 UTC
14
points
2
comments
1
min read
LW
link
(sigbovik.org)
“A Definition of Subjective Probability” by Anscombe and Aumann
JonahS
24 Jan 2014 20:30 UTC
14
points
2
comments
2
min read
LW
link
Snyder-Beattie, Sandberg, Drexler & Bonsall (2020): The Timing of Evolutionary Transitions Suggests Intelligent Life Is Rare
Kaj_Sotala
24 Nov 2020 10:36 UTC
83
points
20
comments
2
min read
LW
link
(www.liebertpub.com)
[Paper] The Global Catastrophic Risks of the Possibility of Finding Alien AI During SETI
avturchin
28 Aug 2018 21:32 UTC
13
points
2
comments
1
min read
LW
link
Comment on “Endogenous Epistemic Factionalization”
Zack_M_Davis
20 May 2020 18:04 UTC
151
points
8
comments
7
min read
LW
link
Optimized Propaganda with Bayesian Networks: Comment on “Articulating Lay Theories Through Graphical Models”
Zack_M_Davis
29 Jun 2020 2:45 UTC
105
points
10
comments
4
min read
LW
link
Formal Solution to the Inner Alignment Problem
michaelcohen
18 Feb 2021 14:51 UTC
49
points
123
comments
2
min read
LW
link
Deep limitations? Examining expert disagreement over deep learning
Richard_Ngo
27 Jun 2021 0:55 UTC
18
points
6
comments
1
min read
LW
link
(link.springer.com)
Entropic boundary conditions towards safe artificial superintelligence
Santiago Nunez-Corrales
20 Jul 2021 22:15 UTC
3
points
0
comments
2
min read
LW
link
(www.tandfonline.com)
Comment on “Deception as Cooperation”
Zack_M_Davis
27 Nov 2021 4:04 UTC
23
points
4
comments
7
min read
LW
link
2021 AI Alignment Literature Review and Charity Comparison
Larks
23 Dec 2021 14:06 UTC
168
points
28
comments
73
min read
LW
link
Reading the ethicists: A review of articles on AI in the journal Science and Engineering Ethics
Charlie Steiner
18 May 2022 20:52 UTC
50
points
8
comments
14
min read
LW
link
Paper: Forecasting world events with neural nets
Owain_Evans
,
Dan H
and
Joe Kwon
1 Jul 2022 19:40 UTC
39
points
3
comments
4
min read
LW
link
Poster Session on AI Safety
Neil Crawford
12 Nov 2022 3:50 UTC
7
points
6
comments
1
min read
LW
link
How to Read Papers Efficiently: Fast-then-Slow Three pass method
the gears to ascension
,
1stuserhere
and
lastuserhere
25 Feb 2023 2:56 UTC
36
points
4
comments
4
min read
LW
link
(ccr.sigcomm.org)
Effect heterogeneity and external validity in medicine
Anders_H
25 Oct 2019 20:53 UTC
49
points
14
comments
7
min read
LW
link
Learning biases and rewards simultaneously
Rohin Shah
6 Jul 2019 1:45 UTC
41
points
3
comments
4
min read
LW
link
Reasoning isn’t about logic (it’s about arguing)
Morendil
14 Mar 2010 4:42 UTC
66
points
31
comments
3
min read
LW
link
Learning preferences by looking at the world
Rohin Shah
12 Feb 2019 22:25 UTC
43
points
10
comments
7
min read
LW
link
(bair.berkeley.edu)
[Question]
How Old is Smallpox?
Raemon
10 Dec 2018 10:50 UTC
44
points
5
comments
2
min read
LW
link
Is Caviar a Risk Factor For Being a Millionaire?
Anders_H
9 Dec 2016 16:27 UTC
67
points
9
comments
1
min read
LW
link
[Link] Computer improves its Civilization II gameplay by reading the manual
Kaj_Sotala
13 Jul 2011 12:00 UTC
49
points
5
comments
4
min read
LW
link
Article Review: Discovering Latent Knowledge (Burns, Ye, et al)
Robert_AIZI
22 Dec 2022 18:16 UTC
13
points
4
comments
6
min read
LW
link
(aizi.substack.com)
A Summary Of Anthropic’s First Paper
Sam Ringer
30 Dec 2021 0:48 UTC
85
points
1
comment
8
min read
LW
link
Generalizing Experimental Results by Leveraging Knowledge of Mechanisms
Carlos_Cinelli
11 Dec 2019 20:39 UTC
50
points
5
comments
1
min read
LW
link
New paper: Corrigibility with Utility Preservation
Koen.Holtman
6 Aug 2019 19:04 UTC
44
points
11
comments
2
min read
LW
link
Memory, nutrition, motivation, and genes
PhilGoetz
26 Feb 2013 5:25 UTC
24
points
12
comments
2
min read
LW
link
Human-AI Collaboration
Rohin Shah
22 Oct 2019 6:32 UTC
42
points
7
comments
2
min read
LW
link
(bair.berkeley.edu)
“Everything is Correlated”: An Anthology of the Psychology Debate
gwern
27 Apr 2019 13:48 UTC
41
points
2
comments
1
min read
LW
link
(www.gwern.net)
Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search
Arjun Panickssery
12 Feb 2024 0:56 UTC
55
points
13
comments
3
min read
LW
link
A discussion of the paper, “Large Language Models are Zero-Shot Reasoners”
HiroSakuraba
26 May 2022 15:55 UTC
7
points
0
comments
4
min read
LW
link
David Chalmers’ “The Singularity: A Philosophical Analysis”
lukeprog
29 Jan 2011 2:52 UTC
55
points
203
comments
4
min read
LW
link
Let’s Discuss Functional Decision Theory
Chris_Leong
23 Jul 2018 7:24 UTC
29
points
18
comments
1
min read
LW
link
Introducing Corrigibility (an FAI research subfield)
So8res
20 Oct 2014 21:09 UTC
52
points
28
comments
3
min read
LW
link
Counterfactual outcome state transition parameters
Anders_H
27 Jul 2018 21:13 UTC
37
points
1
comment
6
min read
LW
link
How to escape from your sandbox and from your hardware host
PhilGoetz
31 Jul 2015 17:26 UTC
43
points
28
comments
1
min read
LW
link
Oracle paper
Stuart_Armstrong
13 Dec 2017 14:59 UTC
12
points
7
comments
1
min read
LW
link
New paper: The Incentives that Shape Behaviour
RyanCarey
23 Jan 2020 19:07 UTC
23
points
5
comments
1
min read
LW
link
(arxiv.org)
Dissolving the Fermi Paradox, and what reflection it provides
Jan_Kulveit
30 Jun 2018 16:35 UTC
28
points
22
comments
1
min read
LW
link
(arxiv.org)
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
DragonGod
6 Dec 2017 6:01 UTC
13
points
4
comments
1
min read
LW
link
(arxiv.org)
Summary: Surreal Decisions
Chris_Leong
27 Nov 2018 14:15 UTC
29
points
20
comments
3
min read
LW
link
How Big a Deal are MatMul-Free Transformers?
JustisMills
27 Jun 2024 22:28 UTC
19
points
6
comments
5
min read
LW
link
(justismills.substack.com)
To Learn Critical Thinking, Study Critical Thinking
gwern
7 Jul 2012 23:50 UTC
41
points
16
comments
11
min read
LW
link
Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt
,
srm
,
MikhailB
,
Lewis Hammond
,
chansmi
and
sofmonk
16 Sep 2024 16:07 UTC
55
points
7
comments
31
min read
LW
link
‘Chat with impactful research & evaluations’ (Unjournal NotebookLMs)
david reinstein
28 Sep 2024 0:32 UTC
6
points
0
comments
2
min read
LW
link
[Question]
Searching for Impossibility Results or No-Go Theorems for provable safety.
Maelstrom
27 Sep 2024 20:12 UTC
2
points
1
comment
1
min read
LW
link
An Overview of Sparks of Artificial General Intelligence: Early experiments with GPT-4
Annapurna
27 Mar 2023 13:44 UTC
10
points
0
comments
7
min read
LW
link
(jorgevelez.substack.com)
Paper digestion: “May We Have Your Attention Please? Human-Rights NGOs and the Problem of Global Communication”
Klara Helene Nielsen
20 Jul 2023 17:08 UTC
4
points
1
comment
2
min read
LW
link
(journals.sagepub.com)
The Physiology of Willpower
pjeby
18 Jun 2009 4:11 UTC
25
points
36
comments
1
min read
LW
link
Experts vs. parents
PhilGoetz
29 Sep 2010 16:48 UTC
24
points
23
comments
1
min read
LW
link
The Mind Is Not Designed For Thinking
CronoDAS
26 Mar 2009 21:57 UTC
9
points
7
comments
1
min read
LW
link
[Link] Persistence of Long-Term Memory in Vitrified and Revived C. elegans worms
Rangi
24 May 2015 3:43 UTC
35
points
8
comments
1
min read
LW
link
[Question]
Can this model grade a test without knowing the answers?
Elizabeth
31 Aug 2019 0:53 UTC
20
points
3
comments
1
min read
LW
link
Implications of Quantum Computing for Artificial Intelligence Alignment Research
Jsevillamol
and
PabloAMC
22 Aug 2019 10:33 UTC
24
points
3
comments
13
min read
LW
link
Citability of Lesswrong and the Alignment Forum
Leon Lang
8 Jan 2023 22:12 UTC
48
points
2
comments
1
min read
LW
link
Link: Writing exercise closes the gender gap in university-level physics
Vladimir_Golovin
27 Nov 2010 16:28 UTC
27
points
9
comments
1
min read
LW
link
Donohue, Levitt, Roe, and Wade: T-minus 20 years to a massive crime wave?
Paul Logan
3 Jul 2022 3:03 UTC
−24
points
6
comments
3
min read
LW
link
(laulpogan.substack.com)
Over-encapsulation
PhilGoetz
25 Mar 2010 17:58 UTC
29
points
56
comments
3
min read
LW
link
FHI paper published in Science: interventions against COVID-19
SoerenMind
16 Dec 2020 21:19 UTC
119
points
0
comments
3
min read
LW
link
VLM-RM: Specifying Rewards with Natural Language
ChengCheng
,
David Lindner
and
Ethan Perez
23 Oct 2023 14:11 UTC
20
points
2
comments
5
min read
LW
link
(far.ai)
NeurIPS ML Safety Workshop 2022
Dan H
26 Jul 2022 15:28 UTC
72
points
2
comments
1
min read
LW
link
(neurips2022.mlsafety.org)
[Question]
How can we secure more research positions at our universities for x-risk researchers?
Neil Crawford
6 Sep 2022 17:17 UTC
11
points
0
comments
1
min read
LW
link
That one apocalyptic nuclear famine paper is bunk
Lao Mein
12 Oct 2022 3:33 UTC
110
points
10
comments
1
min read
LW
link
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Henry Cai
16 Jun 2024 13:01 UTC
7
points
0
comments
7
min read
LW
link
(arxiv.org)
Hope Function
gwern
1 Jul 2012 15:40 UTC
38
points
8
comments
1
min read
LW
link
Rawls’s Veil of Ignorance Doesn’t Make Any Sense
Arjun Panickssery
24 Feb 2024 13:18 UTC
10
points
9
comments
1
min read
LW
link
How You Can Gain Self Control Without “Self-Control”
spencerg
24 Mar 2021 23:38 UTC
109
points
41
comments
23
min read
LW
link
Functional Trade-offs
weathersystems
19 May 2021 1:06 UTC
5
points
0
comments
6
min read
LW
link
“Are Experiments Possible?” Seeds of Science call for reviewers
rogersbacon
2 Nov 2022 20:05 UTC
8
points
0
comments
1
min read
LW
link
Characterizing Intrinsic Compositionality in Transformers with Tree Projections
Ulisse Mini
13 Nov 2022 9:46 UTC
12
points
2
comments
1
min read
LW
link
(arxiv.org)
How truthful is GPT-3? A benchmark for language models
Owain_Evans
16 Sep 2021 10:09 UTC
58
points
24
comments
6
min read
LW
link
Walkthrough of the Tiling Agents for Self-Modifying AI paper
So8res
13 Dec 2013 3:23 UTC
29
points
18
comments
21
min read
LW
link
Doing your good deed for the day
Scott Alexander
27 Oct 2009 0:45 UTC
152
points
57
comments
3
min read
LW
link
[linkpost] Acquisition of Chess Knowledge in AlphaZero
Quintin Pope
23 Nov 2021 7:55 UTC
8
points
1
comment
1
min read
LW
link
Demanding and Designing Aligned Cognitive Architectures
Koen.Holtman
21 Dec 2021 17:32 UTC
8
points
5
comments
5
min read
LW
link
Even if you have a nail, not all hammers are the same
PhilGoetz
29 Mar 2010 18:09 UTC
150
points
126
comments
6
min read
LW
link
Less Competition, More Meritocracy?
Zvi
20 Jan 2019 2:00 UTC
85
points
19
comments
20
min read
LW
link
3
reviews
(thezvi.wordpress.com)
A New Interpretation of the Marshmallow Test
elharo
5 Jul 2013 12:22 UTC
119
points
25
comments
2
min read
LW
link
Good News for Immunostimulants
sarahconstantin
16 Apr 2018 16:10 UTC
26
points
9
comments
2
min read
LW
link
(srconstantin.wordpress.com)
Let’s Read: Superhuman AI for multiplayer poker
Yuxi_Liu
14 Jul 2019 6:22 UTC
56
points
6
comments
8
min read
LW
link
Tiling Agents for Self-Modifying AI (OPFAI #2)
Eliezer Yudkowsky
6 Jun 2013 20:24 UTC
88
points
259
comments
3
min read
LW
link
The Vulnerable World Hypothesis (by Bostrom)
Ben Pace
6 Nov 2018 20:05 UTC
50
points
17
comments
4
min read
LW
link
(nickbostrom.com)
DeepMind article: AI Safety Gridworlds
scarcegreengrass
30 Nov 2017 16:13 UTC
25
points
6
comments
1
min read
LW
link
(deepmind.com)
Claims & Assumptions made in Eternity in Six Hours
Ruby
8 May 2019 23:11 UTC
50
points
7
comments
3
min read
LW
link
[1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | Arxiv
DragonGod
21 Nov 2019 1:18 UTC
52
points
4
comments
1
min read
LW
link
(arxiv.org)
No comments.
Back to top