RSS

Aca­demic Papers

TagLast edit: Feb 8, 2025, 12:32 AM by lesswrong-internal

Posts either linking to, or summarizing, formal papers published elsewhere.

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew_CritchNov 19, 2020, 3:18 AM
205 points
37 comments50 min readLW link2 reviews

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM
37 points
4 comments2 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM
64 points
30 comments11 min readLW link

Thirty-three ran­domly se­lected bioethics papers

Rob BensingerMar 22, 2021, 9:38 PM
115 points
46 comments50 min readLW link

My Reser­va­tions about Dis­cov­er­ing La­tent Knowl­edge (Burns, Ye, et al)

Robert_AIZIDec 27, 2022, 5:27 PM
50 points
0 comments4 min readLW link
(aizi.substack.com)

Publi­ca­tion of “An­thropic De­ci­sion The­ory”

Stuart_ArmstrongSep 20, 2017, 3:41 PM
12 points
9 comments1 min readLW link

Ev­i­dence of Learned Look-Ahead in a Chess-Play­ing Neu­ral Network

Erik JennerJun 4, 2024, 3:50 PM
120 points
14 comments13 min readLW link

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

Jul 8, 2024, 10:24 PM
107 points
36 comments5 min readLW link

Paper club: He et al. on mod­u­lar ar­ith­metic (part I)

Dmitry VaintrobJan 13, 2025, 11:18 AM
13 points
0 comments8 min readLW link

New pa­per: Long-Term Tra­jec­to­ries of Hu­man Civilization

Kaj_SotalaAug 12, 2018, 9:10 AM
33 points
1 comment2 min readLW link
(kajsotala.fi)

Study on what makes peo­ple ap­prove or con­demn mind up­load tech­nol­ogy; refer­ences LW

Kaj_SotalaJul 10, 2018, 5:14 PM
22 points
0 comments2 min readLW link
(www.nature.com)

AGI Safety Liter­a­ture Re­view (Ever­itt, Lea & Hut­ter 2018)

Kaj_SotalaMay 4, 2018, 8:56 AM
14 points
1 comment1 min readLW link
(arxiv.org)

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_SotalaFeb 12, 2018, 12:30 PM
45 points
4 comments6 min readLW link
(kajsotala.fi)

Papers for 2017

Kaj_SotalaJan 4, 2018, 1:30 PM
12 points
2 comments2 min readLW link
(kajsotala.fi)

Paper: Su­per­in­tel­li­gence as a Cause or Cure for Risks of Astro­nom­i­cal Suffering

Kaj_SotalaJan 3, 2018, 1:57 PM
13 points
0 comments1 min readLW link
(www.informatica.si)

So­cial Choice Ethics in Ar­tifi­cial In­tel­li­gence (pa­per challeng­ing CEV-like ap­proaches to choos­ing an AI’s val­ues)

Kaj_SotalaOct 3, 2017, 5:39 PM
3 points
0 comments1 min readLW link
(papers.ssrn.com)

[link] Why Self-Con­trol Seems (but may not be) Limited

Kaj_SotalaJan 20, 2014, 4:55 PM
55 points
10 comments3 min readLW link

Kurzban et al. on op­por­tu­nity cost mod­els of men­tal fa­tigue and re­source-based mod­els of willpower

Kaj_SotalaDec 6, 2013, 9:54 AM
34 points
18 comments5 min readLW link

Fal­la­cies as weak Bayesian evidence

Kaj_SotalaMar 18, 2012, 3:53 AM
89 points
42 comments10 min readLW link

I Was Not Al­most Wrong But I Was Al­most Right: Close-Call Coun­ter­fac­tu­als and Bias

Kaj_SotalaMar 8, 2012, 5:39 AM
86 points
40 comments9 min readLW link

[Preprint for com­ment­ing] Digi­tal Im­mor­tal­ity: The­ory and Pro­to­col for Indi­rect Mind Uploading

avturchinMar 27, 2018, 11:49 AM
8 points
5 comments1 min readLW link

IJMC Mind Upload­ing Spe­cial Is­sue published

Kaj_SotalaJun 22, 2012, 11:58 AM
19 points
12 comments1 min readLW link

Bad news for uploading

PhilGoetzDec 13, 2012, 11:32 PM
19 points
6 comments1 min readLW link

“Per­sonal Iden­tity and Upload­ing”, by Mark Walker

gwernJan 7, 2012, 7:55 PM
7 points
19 comments16 min readLW link

“Ray Kurzweil and Upload­ing: Just Say No!”, Nick Agar

gwernDec 2, 2011, 9:42 PM
6 points
79 comments6 min readLW link

SSC Jour­nal Club: AI Timelines

Scott AlexanderJun 8, 2017, 7:00 PM
15 points
16 comments8 min readLW link

Com­put­er­phile dis­cusses MIRI’s “Log­i­cal In­duc­tion” paper

Parth AthleyOct 4, 2018, 4:00 PM
43 points
2 comments1 min readLW link
(www.youtube.com)

New pa­per from MIRI: “Toward ideal­ized de­ci­sion the­ory”

So8resDec 16, 2014, 10:27 PM
41 points
22 comments3 min readLW link

Notes/​blog posts on two re­cent MIRI papers

QuinnJul 14, 2013, 11:11 PM
35 points
3 comments1 min readLW link

[LINK] In­ter­na­tional vari­a­tion in IQ – the role of parasites

David_GerardMay 14, 2012, 12:08 PM
10 points
49 comments1 min readLW link

IQ Scores Fail to Pre­dict Aca­demic Perfor­mance in Chil­dren With Autism

InquilineKeaNov 18, 2010, 3:34 AM
9 points
9 comments2 min readLW link

[LINK] Neu­ro­scien­tists Find That Sta­tus within Groups Can Affect IQ

cafesofieJan 23, 2012, 7:52 PM
6 points
5 comments1 min readLW link

New re­port: In­tel­li­gence Ex­plo­sion Microeconomics

Eliezer YudkowskyApr 29, 2013, 11:14 PM
72 points
246 comments3 min readLW link

The Chro­matic Num­ber of the Plane is at Least 5 - Aubrey de Grey

Scott GarrabrantApr 11, 2018, 6:19 PM
61 points
5 comments1 min readLW link
(arxiv.org)

[Question] Why is pseudo-al­ign­ment “worse” than other ways ML can fail to gen­er­al­ize?

nostalgebraistJul 18, 2020, 10:54 PM
45 points
9 comments2 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_SotalaMay 2, 2020, 7:35 AM
43 points
19 comments7 min readLW link
(plato.stanford.edu)

Mul­ti­verse-wide Co­op­er­a­tion via Cor­re­lated De­ci­sion Making

Kaj_SotalaAug 20, 2017, 12:01 PM
5 points
2 comments1 min readLW link
(foundational-research.org)

A tech­ni­cal note on bil­in­ear lay­ers for interpretability

Lee SharkeyMay 8, 2023, 6:06 AM
58 points
0 comments1 min readLW link
(arxiv.org)

Papers, Please #1: Var­i­ous Papers on Em­ploy­ment, Wages and Productivity

ZviMay 22, 2023, 12:00 PM
42 points
2 comments8 min readLW link
(thezvi.wordpress.com)

Au­mann Agree­ment by Combat

roryokaneApr 5, 2019, 5:15 AM
14 points
2 comments1 min readLW link
(sigbovik.org)

“A Defi­ni­tion of Sub­jec­tive Prob­a­bil­ity” by An­scombe and Aumann

JonahSJan 24, 2014, 8:30 PM
14 points
2 comments2 min readLW link

Sny­der-Beat­tie, Sand­berg, Drexler & Bon­sall (2020): The Timing of Evolu­tion­ary Tran­si­tions Suggests In­tel­li­gent Life Is Rare

Kaj_SotalaNov 24, 2020, 10:36 AM
83 points
20 comments2 min readLW link
(www.liebertpub.com)

[Paper] The Global Catas­trophic Risks of the Pos­si­bil­ity of Find­ing Alien AI Dur­ing SETI

avturchinAug 28, 2018, 9:32 PM
13 points
2 comments1 min readLW link

Com­ment on “En­doge­nous Epistemic Fac­tion­al­iza­tion”

Zack_M_DavisMay 20, 2020, 6:04 PM
158 points
8 comments7 min readLW link

Op­ti­mized Pro­pa­ganda with Bayesian Net­works: Com­ment on “Ar­tic­u­lat­ing Lay The­o­ries Through Graph­i­cal Models”

Zack_M_DavisJun 29, 2020, 2:45 AM
105 points
10 comments4 min readLW link

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohenFeb 18, 2021, 2:51 PM
49 points
123 comments2 min readLW link

Deep limi­ta­tions? Ex­am­in­ing ex­pert dis­agree­ment over deep learning

Richard_NgoJun 27, 2021, 12:55 AM
18 points
6 comments1 min readLW link
(link.springer.com)

En­tropic bound­ary con­di­tions to­wards safe ar­tifi­cial superintelligence

Santiago Nunez-CorralesJul 20, 2021, 10:15 PM
3 points
0 comments2 min readLW link
(www.tandfonline.com)

Com­ment on “De­cep­tion as Co­op­er­a­tion”

Zack_M_DavisNov 27, 2021, 4:04 AM
23 points
4 comments7 min readLW link

2021 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 23, 2021, 2:06 PM
168 points
28 comments73 min readLW link

Read­ing the ethi­cists: A re­view of ar­ti­cles on AI in the jour­nal Science and Eng­ineer­ing Ethics

Charlie SteinerMay 18, 2022, 8:52 PM
50 points
8 comments14 min readLW link

Paper: Fore­cast­ing world events with neu­ral nets

Jul 1, 2022, 7:40 PM
39 points
3 comments4 min readLW link

Poster Ses­sion on AI Safety

Neil CrawfordNov 12, 2022, 3:50 AM
7 points
6 comments1 min readLW link

How to Read Papers Effi­ciently: Fast-then-Slow Three pass method

Feb 25, 2023, 2:56 AM
36 points
4 comments4 min readLW link
(ccr.sigcomm.org)

Claims & As­sump­tions made in Eter­nity in Six Hours

RubyMay 8, 2019, 11:11 PM
50 points
7 comments3 min readLW link

[1911.08265] Mas­ter­ing Atari, Go, Chess and Shogi by Plan­ning with a Learned Model | Arxiv

DragonGodNov 21, 2019, 1:18 AM
52 points
4 comments1 min readLW link
(arxiv.org)

Effect het­ero­gene­ity and ex­ter­nal val­idity in medicine

Anders_HOct 25, 2019, 8:53 PM
49 points
14 comments7 min readLW link

Learn­ing bi­ases and re­wards simultaneously

Rohin ShahJul 6, 2019, 1:45 AM
41 points
3 comments4 min readLW link

Rea­son­ing isn’t about logic (it’s about ar­gu­ing)

MorendilMar 14, 2010, 4:42 AM
66 points
31 comments3 min readLW link

Learn­ing prefer­ences by look­ing at the world

Rohin ShahFeb 12, 2019, 10:25 PM
43 points
10 comments7 min readLW link
(bair.berkeley.edu)

[Question] How Old is Smal­lpox?

RaemonDec 10, 2018, 10:50 AM
44 points
5 comments2 min readLW link

Is Caviar a Risk Fac­tor For Be­ing a Million­aire?

Anders_HDec 9, 2016, 4:27 PM
67 points
9 comments1 min readLW link

[Link] Com­puter im­proves its Civ­i­liza­tion II game­play by read­ing the manual

Kaj_SotalaJul 13, 2011, 12:00 PM
49 points
5 comments4 min readLW link

Ar­ti­cle Re­view: Dis­cov­er­ing La­tent Knowl­edge (Burns, Ye, et al)

Robert_AIZIDec 22, 2022, 6:16 PM
13 points
4 comments6 min readLW link
(aizi.substack.com)

A Sum­mary Of An­thropic’s First Paper

Sam RingerDec 30, 2021, 12:48 AM
85 points
1 comment8 min readLW link

Gen­er­al­iz­ing Ex­per­i­men­tal Re­sults by Lev­er­ag­ing Knowl­edge of Mechanisms

Carlos_CinelliDec 11, 2019, 8:39 PM
50 points
5 comments1 min readLW link

New pa­per: Cor­rigi­bil­ity with Utility Preservation

Koen.HoltmanAug 6, 2019, 7:04 PM
44 points
11 comments2 min readLW link

Me­mory, nu­tri­tion, mo­ti­va­tion, and genes

PhilGoetzFeb 26, 2013, 5:25 AM
24 points
12 comments2 min readLW link

Hu­man-AI Collaboration

Rohin ShahOct 22, 2019, 6:32 AM
42 points
7 comments2 min readLW link
(bair.berkeley.edu)

“Every­thing is Cor­re­lated”: An An­thol­ogy of the Psy­chol­ogy Debate

gwernApr 27, 2019, 1:48 PM
41 points
2 comments1 min readLW link
(www.gwern.net)

Skep­ti­cism About Deep­Mind’s “Grand­mas­ter-Level” Chess Without Search

Arjun PanicksseryFeb 12, 2024, 12:56 AM
57 points
13 comments3 min readLW link

A dis­cus­sion of the pa­per, “Large Lan­guage Models are Zero-Shot Rea­son­ers”

HiroSakurabaMay 26, 2022, 3:55 PM
7 points
0 comments4 min readLW link

David Chalmers’ “The Sin­gu­lar­ity: A Philo­soph­i­cal Anal­y­sis”

lukeprogJan 29, 2011, 2:52 AM
55 points
203 comments4 min readLW link

Let’s Dis­cuss Func­tional De­ci­sion Theory

Chris_LeongJul 23, 2018, 7:24 AM
29 points
18 comments1 min readLW link

In­tro­duc­ing Cor­rigi­bil­ity (an FAI re­search sub­field)

So8resOct 20, 2014, 9:09 PM
52 points
28 comments3 min readLW link

Coun­ter­fac­tual out­come state tran­si­tion parameters

Anders_HJul 27, 2018, 9:13 PM
37 points
1 comment6 min readLW link

How to es­cape from your sand­box and from your hard­ware host

PhilGoetzJul 31, 2015, 5:26 PM
43 points
28 comments1 min readLW link

Or­a­cle paper

Stuart_ArmstrongDec 13, 2017, 2:59 PM
12 points
7 comments1 min readLW link

New pa­per: The In­cen­tives that Shape Behaviour

RyanCareyJan 23, 2020, 7:07 PM
23 points
5 comments1 min readLW link
(arxiv.org)

Dis­solv­ing the Fermi Para­dox, and what re­flec­tion it provides

Jan_KulveitJun 30, 2018, 4:35 PM
28 points
22 comments1 min readLW link
(arxiv.org)

Mas­ter­ing Chess and Shogi by Self-Play with a Gen­eral Re­in­force­ment Learn­ing Algorithm

DragonGodDec 6, 2017, 6:01 AM
13 points
4 comments1 min readLW link
(arxiv.org)

How Big a Deal are MatMul-Free Trans­form­ers?

JustisMillsJun 27, 2024, 10:28 PM
19 points
6 comments5 min readLW link
(justismills.substack.com)

Sum­mary: Sur­real Decisions

Chris_LeongNov 27, 2018, 2:15 PM
29 points
20 comments3 min readLW link

Se­cret Col­lu­sion: Will We Know When to Un­plug AI?

Sep 16, 2024, 4:07 PM
56 points
7 comments31 min readLW link

‘Chat with im­pact­ful re­search & eval­u­a­tions’ (Un­jour­nal Note­bookLMs)

david reinsteinSep 28, 2024, 12:32 AM
6 points
0 comments2 min readLW link

[Question] Search­ing for Im­pos­si­bil­ity Re­sults or No-Go The­o­rems for prov­able safety.

MaelstromSep 27, 2024, 8:12 PM
2 points
1 comment1 min readLW link

To Learn Crit­i­cal Think­ing, Study Crit­i­cal Thinking

gwernJul 7, 2012, 11:50 PM
41 points
16 comments11 min readLW link

Monet: Mix­ture of Monose­man­tic Ex­perts for Trans­form­ers Explained

CalebMarescaJan 25, 2025, 7:37 PM
19 points
2 comments11 min readLW link

Shal­low re­view of tech­ni­cal AI safety, 2024

Dec 29, 2024, 12:01 PM
183 points
34 comments41 min readLW link

An Overview of Sparks of Ar­tifi­cial Gen­eral In­tel­li­gence: Early ex­per­i­ments with GPT-4

AnnapurnaMar 27, 2023, 1:44 PM
10 points
0 comments7 min readLW link
(jorgevelez.substack.com)

Paper di­ges­tion: “May We Have Your At­ten­tion Please? Hu­man-Rights NGOs and the Prob­lem of Global Com­mu­ni­ca­tion”

Klara Helene NielsenJul 20, 2023, 5:08 PM
4 points
1 comment2 min readLW link
(journals.sagepub.com)

The Phys­iol­ogy of Willpower

pjebyJun 18, 2009, 4:11 AM
25 points
36 comments1 min readLW link

Ex­perts vs. parents

PhilGoetzSep 29, 2010, 4:48 PM
24 points
23 comments1 min readLW link

The Mind Is Not De­signed For Thinking

CronoDASMar 26, 2009, 9:57 PM
9 points
7 comments1 min readLW link

[Link] Per­sis­tence of Long-Term Me­mory in Vitrified and Re­vived C. el­e­gans worms

RangiMay 24, 2015, 3:43 AM
35 points
8 comments1 min readLW link

[Question] Can this model grade a test with­out know­ing the an­swers?

ElizabethAug 31, 2019, 12:53 AM
20 points
3 comments1 min readLW link

Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence Align­ment Research

Aug 22, 2019, 10:33 AM
24 points
3 comments13 min readLW link

Cita­bil­ity of Less­wrong and the Align­ment Forum

Leon LangJan 8, 2023, 10:12 PM
48 points
2 comments1 min readLW link

Link: Writ­ing ex­er­cise closes the gen­der gap in uni­ver­sity-level physics

Vladimir_GolovinNov 27, 2010, 4:28 PM
27 points
9 comments1 min readLW link

Dono­hue, Le­vitt, Roe, and Wade: T-minus 20 years to a mas­sive crime wave?

Paul LoganJul 3, 2022, 3:03 AM
−24 points
6 comments3 min readLW link
(laulpogan.substack.com)

Over-encapsulation

PhilGoetzMar 25, 2010, 5:58 PM
29 points
56 comments3 min readLW link

FHI pa­per pub­lished in Science: in­ter­ven­tions against COVID-19

SoerenMindDec 16, 2020, 9:19 PM
119 points
0 comments3 min readLW link

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

Oct 23, 2023, 2:11 PM
20 points
2 comments5 min readLW link
(far.ai)

NeurIPS ML Safety Work­shop 2022

Dan HJul 26, 2022, 3:28 PM
72 points
2 comments1 min readLW link
(neurips2022.mlsafety.org)

[Question] How can we se­cure more re­search po­si­tions at our uni­ver­si­ties for x-risk re­searchers?

Neil CrawfordSep 6, 2022, 5:17 PM
11 points
0 comments1 min readLW link

Self-Con­trol of LLM Be­hav­iors by Com­press­ing Suffix Gra­di­ent into Pre­fix Controller

Henry CaiJun 16, 2024, 1:01 PM
7 points
0 comments7 min readLW link
(arxiv.org)

That one apoc­a­lyp­tic nu­clear famine pa­per is bunk

Lao MeinOct 12, 2022, 3:33 AM
110 points
10 comments1 min readLW link

Hope Function

gwernJul 1, 2012, 3:40 PM
38 points
8 comments1 min readLW link

Rawls’s Veil of Ig­no­rance Doesn’t Make Any Sense

Arjun PanicksseryFeb 24, 2024, 1:18 PM
10 points
9 comments1 min readLW link

How You Can Gain Self Con­trol Without “Self-Con­trol”

spencergMar 24, 2021, 11:38 PM
109 points
41 comments23 min readLW link

Func­tional Trade-offs

weathersystemsMay 19, 2021, 1:06 AM
5 points
0 comments6 min readLW link

“Are Ex­per­i­ments Pos­si­ble?” Seeds of Science call for reviewers

rogersbaconNov 2, 2022, 8:05 PM
8 points
0 comments1 min readLW link

Char­ac­ter­iz­ing In­trin­sic Com­po­si­tion­al­ity in Trans­form­ers with Tree Projections

Ulisse MiniNov 13, 2022, 9:46 AM
12 points
2 comments1 min readLW link
(arxiv.org)

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_EvansSep 16, 2021, 10:09 AM
58 points
24 comments6 min readLW link

Walk­through of the Tiling Agents for Self-Mod­ify­ing AI paper

So8resDec 13, 2013, 3:23 AM
29 points
18 comments21 min readLW link

Do­ing your good deed for the day

Scott AlexanderOct 27, 2009, 12:45 AM
152 points
57 comments3 min readLW link

[linkpost] Ac­qui­si­tion of Chess Knowl­edge in AlphaZero

Quintin PopeNov 23, 2021, 7:55 AM
8 points
1 comment1 min readLW link

De­mand­ing and De­sign­ing Aligned Cog­ni­tive Architectures

Koen.HoltmanDec 21, 2021, 5:32 PM
8 points
5 comments5 min readLW link

Even if you have a nail, not all ham­mers are the same

PhilGoetzMar 29, 2010, 6:09 PM
150 points
126 comments6 min readLW link

Less Com­pe­ti­tion, More Mer­i­toc­racy?

ZviJan 20, 2019, 2:00 AM
85 points
19 comments20 min readLW link3 reviews
(thezvi.wordpress.com)

A New In­ter­pre­ta­tion of the Marsh­mal­low Test

elharoJul 5, 2013, 12:22 PM
119 points
25 comments2 min readLW link

Good News for Immunostimulants

sarahconstantinApr 16, 2018, 4:10 PM
26 points
9 comments2 min readLW link
(srconstantin.wordpress.com)

Let’s Read: Su­per­hu­man AI for mul­ti­player poker

Yuxi_LiuJul 14, 2019, 6:22 AM
56 points
6 comments8 min readLW link

Tiling Agents for Self-Mod­ify­ing AI (OPFAI #2)

Eliezer YudkowskyJun 6, 2013, 8:24 PM
88 points
259 comments3 min readLW link

The Vuln­er­a­ble World Hy­poth­e­sis (by Bostrom)

Ben PaceNov 6, 2018, 8:05 PM
50 points
17 comments4 min readLW link
(nickbostrom.com)

Deep­Mind ar­ti­cle: AI Safety Gridworlds

scarcegreengrassNov 30, 2017, 4:13 PM
25 points
6 comments1 min readLW link
(deepmind.com)
No comments.