Academic Papers

TagLast edit: Feb 8, 2025, 12:32 AM by lesswrong-internal

Posts either linking to, or summarizing, formal papers published elsewhere.

Some AI research areas and their relevance to existential safety

Andrew_CritchNov 19, 2020, 3:18 AM

205 points

37 comments50 min readLW link 2 reviews

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM

37 points

4 comments2 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM

64 points

30 comments11 min readLW link

Thirty-three randomly selected bioethics papers

Rob BensingerMar 22, 2021, 9:38 PM

115 points

46 comments50 min readLW link

My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)

Robert_AIZIDec 27, 2022, 5:27 PM

50 points

0 comments4 min readLW link

(aizi.substack.com)

Publication of “Anthropic Decision Theory”

Stuart_ArmstrongSep 20, 2017, 3:41 PM

12 points

9 comments1 min readLW link

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Erik JennerJun 4, 2024, 3:50 PM

120 points

14 comments13 min readLW link

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

L Rudolf L, bilalchughtai, jan betley, kaivu, Jérémy Scheurer, Mikita Balesni, AlexMeinke, Owain_Evans and Marius Hobbhahn

Jul 8, 2024, 10:24 PM

107 points

36 comments5 min readLW link

Paper club: He et al. on modular arithmetic (part I)

Dmitry VaintrobJan 13, 2025, 11:18 AM

13 points

0 comments8 min readLW link

New paper: Long-Term Trajectories of Human Civilization

Kaj_SotalaAug 12, 2018, 9:10 AM

33 points

1 comment2 min readLW link

(kajsotala.fi)

Study on what makes people approve or condemn mind upload technology; references LW

Kaj_SotalaJul 10, 2018, 5:14 PM

22 points

0 comments2 min readLW link

(www.nature.com)

AGI Safety Literature Review (Everitt, Lea & Hutter 2018)

Kaj_SotalaMay 4, 2018, 8:56 AM

14 points

1 comment1 min readLW link

(arxiv.org)

Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”

Kaj_SotalaFeb 12, 2018, 12:30 PM

45 points

4 comments6 min readLW link

(kajsotala.fi)

Papers for 2017

Kaj_SotalaJan 4, 2018, 1:30 PM

12 points

2 comments2 min readLW link

(kajsotala.fi)

Paper: Superintelligence as a Cause or Cure for Risks of Astronomical Suffering

Kaj_SotalaJan 3, 2018, 1:57 PM

13 points

0 comments1 min readLW link

(www.informatica.si)

Social Choice Ethics in Artificial Intelligence (paper challenging CEV-like approaches to choosing an AI’s values)

Kaj_SotalaOct 3, 2017, 5:39 PM

3 points

0 comments1 min readLW link

(papers.ssrn.com)

[link] Why Self-Control Seems (but may not be) Limited

Kaj_SotalaJan 20, 2014, 4:55 PM

55 points

10 comments3 min readLW link

Kurzban et al. on opportunity cost models of mental fatigue and resource-based models of willpower

Kaj_SotalaDec 6, 2013, 9:54 AM

34 points

18 comments5 min readLW link

Fallacies as weak Bayesian evidence

Kaj_SotalaMar 18, 2012, 3:53 AM

89 points

42 comments10 min readLW link

I Was Not Almost Wrong But I Was Almost Right: Close-Call Counterfactuals and Bias

Kaj_SotalaMar 8, 2012, 5:39 AM

86 points

40 comments9 min readLW link

[Preprint for commenting] Digital Immortality: Theory and Protocol for Indirect Mind Uploading

avturchinMar 27, 2018, 11:49 AM

8 points

5 comments1 min readLW link

IJMC Mind Uploading Special Issue published

Kaj_SotalaJun 22, 2012, 11:58 AM

19 points

12 comments1 min readLW link

Bad news for uploading

PhilGoetzDec 13, 2012, 11:32 PM

19 points

6 comments1 min readLW link

“Personal Identity and Uploading”, by Mark Walker

gwernJan 7, 2012, 7:55 PM

7 points

19 comments16 min readLW link

“Ray Kurzweil and Uploading: Just Say No!”, Nick Agar

gwernDec 2, 2011, 9:42 PM

6 points

79 comments6 min readLW link

SSC Journal Club: AI Timelines

Scott AlexanderJun 8, 2017, 7:00 PM

15 points

16 comments8 min readLW link

Computerphile discusses MIRI’s “Logical Induction” paper

Parth AthleyOct 4, 2018, 4:00 PM

43 points

2 comments1 min readLW link

(www.youtube.com)

New paper from MIRI: “Toward idealized decision theory”

So8resDec 16, 2014, 10:27 PM

41 points

22 comments3 min readLW link

Notes/blog posts on two recent MIRI papers

QuinnJul 14, 2013, 11:11 PM

35 points

3 comments1 min readLW link

[LINK] International variation in IQ – the role of parasites

David_GerardMay 14, 2012, 12:08 PM

10 points

49 comments1 min readLW link

IQ Scores Fail to Predict Academic Performance in Children With Autism

InquilineKeaNov 18, 2010, 3:34 AM

9 points

9 comments2 min readLW link

[LINK] Neuroscientists Find That Status within Groups Can Affect IQ

cafesofieJan 23, 2012, 7:52 PM

6 points

5 comments1 min readLW link

New report: Intelligence Explosion Microeconomics

Eliezer YudkowskyApr 29, 2013, 11:14 PM

72 points

246 comments3 min readLW link

The Chromatic Number of the Plane is at Least 5 - Aubrey de Grey

Scott GarrabrantApr 11, 2018, 6:19 PM

61 points

5 comments1 min readLW link

(arxiv.org)

[Question] Why is pseudo-alignment “worse” than other ways ML can fail to generalize?

nostalgebraistJul 18, 2020, 10:54 PM

45 points

9 comments2 min readLW link

Stanford Encyclopedia of Philosophy on AI ethics and superintelligence

Kaj_SotalaMay 2, 2020, 7:35 AM

43 points

19 comments7 min readLW link

(plato.stanford.edu)

Multiverse-wide Cooperation via Correlated Decision Making

Kaj_SotalaAug 20, 2017, 12:01 PM

5 points

2 comments1 min readLW link

(foundational-research.org)

A technical note on bilinear layers for interpretability

Lee SharkeyMay 8, 2023, 6:06 AM

58 points

0 comments1 min readLW link

(arxiv.org)

Papers, Please #1: Various Papers on Employment, Wages and Productivity

ZviMay 22, 2023, 12:00 PM

42 points

2 comments8 min readLW link

(thezvi.wordpress.com)

Aumann Agreement by Combat

roryokaneApr 5, 2019, 5:15 AM

14 points

2 comments1 min readLW link

(sigbovik.org)

“A Definition of Subjective Probability” by Anscombe and Aumann

JonahSJan 24, 2014, 8:30 PM

14 points

2 comments2 min readLW link

Snyder-Beattie, Sandberg, Drexler & Bonsall (2020): The Timing of Evolutionary Transitions Suggests Intelligent Life Is Rare

Kaj_SotalaNov 24, 2020, 10:36 AM

83 points

20 comments2 min readLW link

(www.liebertpub.com)

[Paper] The Global Catastrophic Risks of the Possibility of Finding Alien AI During SETI

avturchinAug 28, 2018, 9:32 PM

13 points

2 comments1 min readLW link

Comment on “Endogenous Epistemic Factionalization”

Zack_M_DavisMay 20, 2020, 6:04 PM

158 points

8 comments7 min readLW link

Optimized Propaganda with Bayesian Networks: Comment on “Articulating Lay Theories Through Graphical Models”

Zack_M_DavisJun 29, 2020, 2:45 AM

105 points

10 comments4 min readLW link

Formal Solution to the Inner Alignment Problem

michaelcohenFeb 18, 2021, 2:51 PM

49 points

123 comments2 min readLW link

Deep limitations? Examining expert disagreement over deep learning

Richard_NgoJun 27, 2021, 12:55 AM

18 points

6 comments1 min readLW link

(link.springer.com)

Entropic boundary conditions towards safe artificial superintelligence

Santiago Nunez-CorralesJul 20, 2021, 10:15 PM

3 points

0 comments2 min readLW link

(www.tandfonline.com)

Comment on “Deception as Cooperation”

Zack_M_DavisNov 27, 2021, 4:04 AM

23 points

4 comments7 min readLW link

2021 AI Alignment Literature Review and Charity Comparison

LarksDec 23, 2021, 2:06 PM

168 points

28 comments73 min readLW link

Reading the ethicists: A review of articles on AI in the journal Science and Engineering Ethics

Charlie SteinerMay 18, 2022, 8:52 PM

50 points

8 comments14 min readLW link

Paper: Forecasting world events with neural nets

Owain_Evans, Dan H and Joe Kwon

Jul 1, 2022, 7:40 PM

39 points

3 comments4 min readLW link

Poster Session on AI Safety

Neil CrawfordNov 12, 2022, 3:50 AM

7 points

6 comments1 min readLW link

How to Read Papers Efficiently: Fast-then-Slow Three pass method

the gears to ascension, 1stuserhere and lastuserhere

Feb 25, 2023, 2:56 AM

36 points

4 comments4 min readLW link

(ccr.sigcomm.org)

Claims & Assumptions made in Eternity in Six Hours

RubyMay 8, 2019, 11:11 PM

50 points

7 comments3 min readLW link

[1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | Arxiv

DragonGodNov 21, 2019, 1:18 AM

52 points

4 comments1 min readLW link

(arxiv.org)

Effect heterogeneity and external validity in medicine

Anders_HOct 25, 2019, 8:53 PM

49 points

14 comments7 min readLW link

Learning biases and rewards simultaneously

Rohin ShahJul 6, 2019, 1:45 AM

41 points

3 comments4 min readLW link

Reasoning isn’t about logic (it’s about arguing)

MorendilMar 14, 2010, 4:42 AM

66 points

31 comments3 min readLW link

Learning preferences by looking at the world

Rohin ShahFeb 12, 2019, 10:25 PM

43 points

10 comments7 min readLW link

(bair.berkeley.edu)

[Question] How Old is Smallpox?

RaemonDec 10, 2018, 10:50 AM

44 points

5 comments2 min readLW link

Is Caviar a Risk Factor For Being a Millionaire?

Anders_HDec 9, 2016, 4:27 PM

67 points

9 comments1 min readLW link

[Link] Computer improves its Civilization II gameplay by reading the manual

Kaj_SotalaJul 13, 2011, 12:00 PM

49 points

5 comments4 min readLW link

Article Review: Discovering Latent Knowledge (Burns, Ye, et al)

Robert_AIZIDec 22, 2022, 6:16 PM

13 points

4 comments6 min readLW link

(aizi.substack.com)

A Summary Of Anthropic’s First Paper

Sam RingerDec 30, 2021, 12:48 AM

85 points

1 comment8 min readLW link

Generalizing Experimental Results by Leveraging Knowledge of Mechanisms

Carlos_CinelliDec 11, 2019, 8:39 PM

50 points

5 comments1 min readLW link

New paper: Corrigibility with Utility Preservation

Koen.HoltmanAug 6, 2019, 7:04 PM

44 points

11 comments2 min readLW link

Memory, nutrition, motivation, and genes

PhilGoetzFeb 26, 2013, 5:25 AM

24 points

12 comments2 min readLW link

Human-AI Collaboration

Rohin ShahOct 22, 2019, 6:32 AM

42 points

7 comments2 min readLW link

(bair.berkeley.edu)

“Everything is Correlated”: An Anthology of the Psychology Debate

gwernApr 27, 2019, 1:48 PM

41 points

2 comments1 min readLW link

(www.gwern.net)

Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search

Arjun PanicksseryFeb 12, 2024, 12:56 AM

57 points

13 comments3 min readLW link

A discussion of the paper, “Large Language Models are Zero-Shot Reasoners”

HiroSakurabaMay 26, 2022, 3:55 PM

7 points

0 comments4 min readLW link

David Chalmers’ “The Singularity: A Philosophical Analysis”

lukeprogJan 29, 2011, 2:52 AM

55 points

203 comments4 min readLW link

Let’s Discuss Functional Decision Theory

Chris_LeongJul 23, 2018, 7:24 AM

29 points

18 comments1 min readLW link

Introducing Corrigibility (an FAI research subfield)

So8resOct 20, 2014, 9:09 PM

52 points

28 comments3 min readLW link

Counterfactual outcome state transition parameters

Anders_HJul 27, 2018, 9:13 PM

37 points

1 comment6 min readLW link

How to escape from your sandbox and from your hardware host

PhilGoetzJul 31, 2015, 5:26 PM

43 points

28 comments1 min readLW link

Oracle paper

Stuart_ArmstrongDec 13, 2017, 2:59 PM

12 points

7 comments1 min readLW link

New paper: The Incentives that Shape Behaviour

RyanCareyJan 23, 2020, 7:07 PM

23 points

5 comments1 min readLW link

(arxiv.org)

Dissolving the Fermi Paradox, and what reflection it provides

Jan_KulveitJun 30, 2018, 4:35 PM

28 points

22 comments1 min readLW link

(arxiv.org)

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

DragonGodDec 6, 2017, 6:01 AM

13 points

4 comments1 min readLW link

(arxiv.org)

How Big a Deal are MatMul-Free Transformers?

JustisMillsJun 27, 2024, 10:28 PM

19 points

6 comments5 min readLW link

(justismills.substack.com)

Summary: Surreal Decisions

Chris_LeongNov 27, 2018, 2:15 PM

29 points

20 comments3 min readLW link

Secret Collusion: Will We Know When to Unplug AI?

schroederdewitt, srm, MikhailB, Lewis Hammond, chansmi and sofmonk

Sep 16, 2024, 4:07 PM

56 points

7 comments31 min readLW link

‘Chat with impactful research & evaluations’ (Unjournal NotebookLMs)

david reinsteinSep 28, 2024, 12:32 AM

6 points

0 comments2 min readLW link

[Question] Searching for Impossibility Results or No-Go Theorems for provable safety.

MaelstromSep 27, 2024, 8:12 PM

2 points

1 comment1 min readLW link

To Learn Critical Thinking, Study Critical Thinking

gwernJul 7, 2012, 11:50 PM

41 points

16 comments11 min readLW link

Monet: Mixture of Monosemantic Experts for Transformers Explained

CalebMarescaJan 25, 2025, 7:37 PM

19 points

2 comments11 min readLW link

Shallow review of technical AI safety, 2024

technicalities, Stag, Stephen McAleese, jordine and Dr. David Mathers

Dec 29, 2024, 12:01 PM

183 points

34 comments41 min readLW link

An Overview of Sparks of Artificial General Intelligence: Early experiments with GPT-4

AnnapurnaMar 27, 2023, 1:44 PM

10 points

0 comments7 min readLW link

(jorgevelez.substack.com)

Paper digestion: “May We Have Your Attention Please? Human-Rights NGOs and the Problem of Global Communication”

Klara Helene NielsenJul 20, 2023, 5:08 PM

4 points

1 comment2 min readLW link

(journals.sagepub.com)

The Physiology of Willpower

pjebyJun 18, 2009, 4:11 AM

25 points

36 comments1 min readLW link

Experts vs. parents

PhilGoetzSep 29, 2010, 4:48 PM

24 points

23 comments1 min readLW link

The Mind Is Not Designed For Thinking

CronoDASMar 26, 2009, 9:57 PM

9 points

7 comments1 min readLW link

[Link] Persistence of Long-Term Memory in Vitrified and Revived C. elegans worms

RangiMay 24, 2015, 3:43 AM

35 points

8 comments1 min readLW link

[Question] Can this model grade a test without knowing the answers?

ElizabethAug 31, 2019, 12:53 AM

20 points

3 comments1 min readLW link

Implications of Quantum Computing for Artificial Intelligence Alignment Research

Jsevillamol and PabloAMC

Aug 22, 2019, 10:33 AM

24 points

3 comments13 min readLW link

Citability of Lesswrong and the Alignment Forum

Leon LangJan 8, 2023, 10:12 PM

48 points

2 comments1 min readLW link

Link: Writing exercise closes the gender gap in university-level physics

Vladimir_GolovinNov 27, 2010, 4:28 PM

27 points

9 comments1 min readLW link

Donohue, Levitt, Roe, and Wade: T-minus 20 years to a massive crime wave?

Paul LoganJul 3, 2022, 3:03 AM

−24 points

6 comments3 min readLW link

(laulpogan.substack.com)

Over-encapsulation

PhilGoetzMar 25, 2010, 5:58 PM

29 points

56 comments3 min readLW link

FHI paper published in Science: interventions against COVID-19

SoerenMindDec 16, 2020, 9:19 PM

119 points

0 comments3 min readLW link

VLM-RM: Specifying Rewards with Natural Language

ChengCheng, David Lindner and Ethan Perez

Oct 23, 2023, 2:11 PM

20 points

2 comments5 min readLW link

(far.ai)

NeurIPS ML Safety Workshop 2022

Dan HJul 26, 2022, 3:28 PM

72 points

2 comments1 min readLW link

(neurips2022.mlsafety.org)

[Question] How can we secure more research positions at our universities for x-risk researchers?

Neil CrawfordSep 6, 2022, 5:17 PM

11 points

0 comments1 min readLW link

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Henry CaiJun 16, 2024, 1:01 PM

7 points

0 comments7 min readLW link

(arxiv.org)

That one apocalyptic nuclear famine paper is bunk

Lao MeinOct 12, 2022, 3:33 AM

110 points

10 comments1 min readLW link

Hope Function

gwernJul 1, 2012, 3:40 PM

38 points

8 comments1 min readLW link

Rawls’s Veil of Ignorance Doesn’t Make Any Sense

Arjun PanicksseryFeb 24, 2024, 1:18 PM

10 points

9 comments1 min readLW link

How You Can Gain Self Control Without “Self-Control”

spencergMar 24, 2021, 11:38 PM

109 points

41 comments23 min readLW link

Functional Trade-offs

weathersystemsMay 19, 2021, 1:06 AM

5 points

0 comments6 min readLW link

“Are Experiments Possible?” Seeds of Science call for reviewers

rogersbaconNov 2, 2022, 8:05 PM

8 points

0 comments1 min readLW link

Characterizing Intrinsic Compositionality in Transformers with Tree Projections

Ulisse MiniNov 13, 2022, 9:46 AM

12 points

2 comments1 min readLW link

(arxiv.org)

How truthful is GPT-3? A benchmark for language models

Owain_EvansSep 16, 2021, 10:09 AM

58 points

24 comments6 min readLW link

Walkthrough of the Tiling Agents for Self-Modifying AI paper

So8resDec 13, 2013, 3:23 AM

29 points

18 comments21 min readLW link

Doing your good deed for the day

Scott AlexanderOct 27, 2009, 12:45 AM

152 points

57 comments3 min readLW link

[linkpost] Acquisition of Chess Knowledge in AlphaZero

Quintin PopeNov 23, 2021, 7:55 AM

8 points

1 comment1 min readLW link

Demanding and Designing Aligned Cognitive Architectures

Koen.HoltmanDec 21, 2021, 5:32 PM

8 points

5 comments5 min readLW link

Even if you have a nail, not all hammers are the same

PhilGoetzMar 29, 2010, 6:09 PM

150 points

126 comments6 min readLW link

Less Competition, More Meritocracy?

ZviJan 20, 2019, 2:00 AM

85 points

19 comments20 min readLW link 3 reviews

(thezvi.wordpress.com)

A New Interpretation of the Marshmallow Test

elharoJul 5, 2013, 12:22 PM

119 points

25 comments2 min readLW link

Good News for Immunostimulants

sarahconstantinApr 16, 2018, 4:10 PM

26 points

9 comments2 min readLW link

(srconstantin.wordpress.com)

Let’s Read: Superhuman AI for multiplayer poker

Yuxi_LiuJul 14, 2019, 6:22 AM

56 points

6 comments8 min readLW link

Tiling Agents for Self-Modifying AI (OPFAI #2)

Eliezer YudkowskyJun 6, 2013, 8:24 PM

88 points

259 comments3 min readLW link

The Vulnerable World Hypothesis (by Bostrom)

Ben PaceNov 6, 2018, 8:05 PM

50 points

17 comments4 min readLW link

(nickbostrom.com)

DeepMind article: AI Safety Gridworlds

scarcegreengrassNov 30, 2017, 4:13 PM

25 points

6 comments1 min readLW link

(deepmind.com)

No comments.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer