RSS

Aca­demic Papers

TagLast edit: 9 Jul 2020 11:36 UTC by Kaj_Sotala

Posts either linking to, or summarizing, formal papers published elsewhere.

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew_Critch19 Nov 2020 3:18 UTC
204 points
37 comments50 min readLW link2 reviews

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
37 points
4 comments2 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC
64 points
30 comments11 min readLW link

Thirty-three ran­domly se­lected bioethics papers

Rob Bensinger22 Mar 2021 21:38 UTC
115 points
46 comments50 min readLW link

My Reser­va­tions about Dis­cov­er­ing La­tent Knowl­edge (Burns, Ye, et al)

Robert_AIZI27 Dec 2022 17:27 UTC
50 points
0 comments4 min readLW link
(aizi.substack.com)

SSC Jour­nal Club: AI Timelines

Scott Alexander8 Jun 2017 19:00 UTC
15 points
16 comments8 min readLW link

Ev­i­dence of Learned Look-Ahead in a Chess-Play­ing Neu­ral Network

Erik Jenner4 Jun 2024 15:50 UTC
120 points
14 comments13 min readLW link

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
103 points
28 comments5 min readLW link

New pa­per: Long-Term Tra­jec­to­ries of Hu­man Civilization

Kaj_Sotala12 Aug 2018 9:10 UTC
33 points
1 comment2 min readLW link
(kajsotala.fi)

Study on what makes peo­ple ap­prove or con­demn mind up­load tech­nol­ogy; refer­ences LW

Kaj_Sotala10 Jul 2018 17:14 UTC
22 points
0 comments2 min readLW link
(www.nature.com)

AGI Safety Liter­a­ture Re­view (Ever­itt, Lea & Hut­ter 2018)

Kaj_Sotala4 May 2018 8:56 UTC
14 points
1 comment1 min readLW link
(arxiv.org)

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_Sotala12 Feb 2018 12:30 UTC
45 points
4 comments6 min readLW link
(kajsotala.fi)

Papers for 2017

Kaj_Sotala4 Jan 2018 13:30 UTC
12 points
2 comments2 min readLW link
(kajsotala.fi)

Paper: Su­per­in­tel­li­gence as a Cause or Cure for Risks of Astro­nom­i­cal Suffering

Kaj_Sotala3 Jan 2018 13:57 UTC
13 points
0 comments1 min readLW link
(www.informatica.si)

So­cial Choice Ethics in Ar­tifi­cial In­tel­li­gence (pa­per challeng­ing CEV-like ap­proaches to choos­ing an AI’s val­ues)

Kaj_Sotala3 Oct 2017 17:39 UTC
3 points
0 comments1 min readLW link
(papers.ssrn.com)

[link] Why Self-Con­trol Seems (but may not be) Limited

Kaj_Sotala20 Jan 2014 16:55 UTC
55 points
10 comments3 min readLW link

Kurzban et al. on op­por­tu­nity cost mod­els of men­tal fa­tigue and re­source-based mod­els of willpower

Kaj_Sotala6 Dec 2013 9:54 UTC
34 points
18 comments5 min readLW link

Fal­la­cies as weak Bayesian evidence

Kaj_Sotala18 Mar 2012 3:53 UTC
88 points
42 comments10 min readLW link

I Was Not Al­most Wrong But I Was Al­most Right: Close-Call Coun­ter­fac­tu­als and Bias

Kaj_Sotala8 Mar 2012 5:39 UTC
86 points
40 comments9 min readLW link

[Preprint for com­ment­ing] Digi­tal Im­mor­tal­ity: The­ory and Pro­to­col for Indi­rect Mind Uploading

avturchin27 Mar 2018 11:49 UTC
8 points
5 comments1 min readLW link

IJMC Mind Upload­ing Spe­cial Is­sue published

Kaj_Sotala22 Jun 2012 11:58 UTC
19 points
12 comments1 min readLW link

Bad news for uploading

PhilGoetz13 Dec 2012 23:32 UTC
19 points
6 comments1 min readLW link

“Per­sonal Iden­tity and Upload­ing”, by Mark Walker

gwern7 Jan 2012 19:55 UTC
7 points
19 comments16 min readLW link

“Ray Kurzweil and Upload­ing: Just Say No!”, Nick Agar

gwern2 Dec 2011 21:42 UTC
6 points
79 comments6 min readLW link

Publi­ca­tion of “An­thropic De­ci­sion The­ory”

Stuart_Armstrong20 Sep 2017 15:41 UTC
12 points
9 comments1 min readLW link

Com­put­er­phile dis­cusses MIRI’s “Log­i­cal In­duc­tion” paper

Parth Athley4 Oct 2018 16:00 UTC
43 points
2 comments1 min readLW link
(www.youtube.com)

New pa­per from MIRI: “Toward ideal­ized de­ci­sion the­ory”

So8res16 Dec 2014 22:27 UTC
41 points
22 comments3 min readLW link

Notes/​blog posts on two re­cent MIRI papers

Quinn14 Jul 2013 23:11 UTC
35 points
3 comments1 min readLW link

[LINK] In­ter­na­tional vari­a­tion in IQ – the role of parasites

David_Gerard14 May 2012 12:08 UTC
10 points
49 comments1 min readLW link

IQ Scores Fail to Pre­dict Aca­demic Perfor­mance in Chil­dren With Autism

InquilineKea18 Nov 2010 3:34 UTC
9 points
9 comments2 min readLW link

[LINK] Neu­ro­scien­tists Find That Sta­tus within Groups Can Affect IQ

cafesofie23 Jan 2012 19:52 UTC
6 points
5 comments1 min readLW link

New re­port: In­tel­li­gence Ex­plo­sion Microeconomics

Eliezer Yudkowsky29 Apr 2013 23:14 UTC
72 points
246 comments3 min readLW link

The Chro­matic Num­ber of the Plane is at Least 5 - Aubrey de Grey

Scott Garrabrant11 Apr 2018 18:19 UTC
61 points
5 comments1 min readLW link
(arxiv.org)

[Question] Why is pseudo-al­ign­ment “worse” than other ways ML can fail to gen­er­al­ize?

nostalgebraist18 Jul 2020 22:54 UTC
45 points
9 comments2 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_Sotala2 May 2020 7:35 UTC
43 points
19 comments7 min readLW link
(plato.stanford.edu)

Mul­ti­verse-wide Co­op­er­a­tion via Cor­re­lated De­ci­sion Making

Kaj_Sotala20 Aug 2017 12:01 UTC
5 points
2 comments1 min readLW link
(foundational-research.org)

A tech­ni­cal note on bil­in­ear lay­ers for interpretability

Lee Sharkey8 May 2023 6:06 UTC
58 points
0 comments1 min readLW link
(arxiv.org)

Papers, Please #1: Var­i­ous Papers on Em­ploy­ment, Wages and Productivity

Zvi22 May 2023 12:00 UTC
42 points
2 comments8 min readLW link
(thezvi.wordpress.com)

Au­mann Agree­ment by Combat

roryokane5 Apr 2019 5:15 UTC
14 points
2 comments1 min readLW link
(sigbovik.org)

“A Defi­ni­tion of Sub­jec­tive Prob­a­bil­ity” by An­scombe and Aumann

JonahS24 Jan 2014 20:30 UTC
14 points
2 comments2 min readLW link

Sny­der-Beat­tie, Sand­berg, Drexler & Bon­sall (2020): The Timing of Evolu­tion­ary Tran­si­tions Suggests In­tel­li­gent Life Is Rare

Kaj_Sotala24 Nov 2020 10:36 UTC
83 points
20 comments2 min readLW link
(www.liebertpub.com)

[Paper] The Global Catas­trophic Risks of the Pos­si­bil­ity of Find­ing Alien AI Dur­ing SETI

avturchin28 Aug 2018 21:32 UTC
13 points
2 comments1 min readLW link

Com­ment on “En­doge­nous Epistemic Fac­tion­al­iza­tion”

Zack_M_Davis20 May 2020 18:04 UTC
151 points
8 comments7 min readLW link

Op­ti­mized Pro­pa­ganda with Bayesian Net­works: Com­ment on “Ar­tic­u­lat­ing Lay The­o­ries Through Graph­i­cal Models”

Zack_M_Davis29 Jun 2020 2:45 UTC
105 points
10 comments4 min readLW link

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohen18 Feb 2021 14:51 UTC
49 points
123 comments2 min readLW link

Deep limi­ta­tions? Ex­am­in­ing ex­pert dis­agree­ment over deep learning

Richard_Ngo27 Jun 2021 0:55 UTC
18 points
6 comments1 min readLW link
(link.springer.com)

En­tropic bound­ary con­di­tions to­wards safe ar­tifi­cial superintelligence

Santiago Nunez-Corrales20 Jul 2021 22:15 UTC
3 points
0 comments2 min readLW link
(www.tandfonline.com)

Com­ment on “De­cep­tion as Co­op­er­a­tion”

Zack_M_Davis27 Nov 2021 4:04 UTC
23 points
4 comments7 min readLW link

2021 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks23 Dec 2021 14:06 UTC
168 points
28 comments73 min readLW link

Read­ing the ethi­cists: A re­view of ar­ti­cles on AI in the jour­nal Science and Eng­ineer­ing Ethics

Charlie Steiner18 May 2022 20:52 UTC
50 points
8 comments14 min readLW link

Paper: Fore­cast­ing world events with neu­ral nets

1 Jul 2022 19:40 UTC
39 points
3 comments4 min readLW link

Poster Ses­sion on AI Safety

Neil Crawford12 Nov 2022 3:50 UTC
7 points
6 comments1 min readLW link

How to Read Papers Effi­ciently: Fast-then-Slow Three pass method

25 Feb 2023 2:56 UTC
36 points
4 comments4 min readLW link
(ccr.sigcomm.org)

Effect het­ero­gene­ity and ex­ter­nal val­idity in medicine

Anders_H25 Oct 2019 20:53 UTC
49 points
14 comments7 min readLW link

Learn­ing bi­ases and re­wards simultaneously

Rohin Shah6 Jul 2019 1:45 UTC
41 points
3 comments4 min readLW link

Rea­son­ing isn’t about logic (it’s about ar­gu­ing)

Morendil14 Mar 2010 4:42 UTC
66 points
31 comments3 min readLW link

Learn­ing prefer­ences by look­ing at the world

Rohin Shah12 Feb 2019 22:25 UTC
43 points
10 comments7 min readLW link
(bair.berkeley.edu)

[Question] How Old is Smal­lpox?

Raemon10 Dec 2018 10:50 UTC
44 points
5 comments2 min readLW link

Is Caviar a Risk Fac­tor For Be­ing a Million­aire?

Anders_H9 Dec 2016 16:27 UTC
67 points
9 comments1 min readLW link

[Link] Com­puter im­proves its Civ­i­liza­tion II game­play by read­ing the manual

Kaj_Sotala13 Jul 2011 12:00 UTC
49 points
5 comments4 min readLW link

Ar­ti­cle Re­view: Dis­cov­er­ing La­tent Knowl­edge (Burns, Ye, et al)

Robert_AIZI22 Dec 2022 18:16 UTC
13 points
4 comments6 min readLW link
(aizi.substack.com)

A Sum­mary Of An­thropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC
85 points
1 comment8 min readLW link

Gen­er­al­iz­ing Ex­per­i­men­tal Re­sults by Lev­er­ag­ing Knowl­edge of Mechanisms

Carlos_Cinelli11 Dec 2019 20:39 UTC
50 points
5 comments1 min readLW link

New pa­per: Cor­rigi­bil­ity with Utility Preservation

Koen.Holtman6 Aug 2019 19:04 UTC
44 points
11 comments2 min readLW link

Me­mory, nu­tri­tion, mo­ti­va­tion, and genes

PhilGoetz26 Feb 2013 5:25 UTC
24 points
12 comments2 min readLW link

Hu­man-AI Collaboration

Rohin Shah22 Oct 2019 6:32 UTC
42 points
7 comments2 min readLW link
(bair.berkeley.edu)

“Every­thing is Cor­re­lated”: An An­thol­ogy of the Psy­chol­ogy Debate

gwern27 Apr 2019 13:48 UTC
41 points
2 comments1 min readLW link
(www.gwern.net)

Skep­ti­cism About Deep­Mind’s “Grand­mas­ter-Level” Chess Without Search

Arjun Panickssery12 Feb 2024 0:56 UTC
55 points
13 comments3 min readLW link

A dis­cus­sion of the pa­per, “Large Lan­guage Models are Zero-Shot Rea­son­ers”

HiroSakuraba26 May 2022 15:55 UTC
7 points
0 comments4 min readLW link

David Chalmers’ “The Sin­gu­lar­ity: A Philo­soph­i­cal Anal­y­sis”

lukeprog29 Jan 2011 2:52 UTC
55 points
203 comments4 min readLW link

Let’s Dis­cuss Func­tional De­ci­sion Theory

Chris_Leong23 Jul 2018 7:24 UTC
29 points
18 comments1 min readLW link

In­tro­duc­ing Cor­rigi­bil­ity (an FAI re­search sub­field)

So8res20 Oct 2014 21:09 UTC
52 points
28 comments3 min readLW link

Coun­ter­fac­tual out­come state tran­si­tion parameters

Anders_H27 Jul 2018 21:13 UTC
37 points
1 comment6 min readLW link

How to es­cape from your sand­box and from your hard­ware host

PhilGoetz31 Jul 2015 17:26 UTC
43 points
28 comments1 min readLW link

Or­a­cle paper

Stuart_Armstrong13 Dec 2017 14:59 UTC
12 points
7 comments1 min readLW link

New pa­per: The In­cen­tives that Shape Behaviour

RyanCarey23 Jan 2020 19:07 UTC
23 points
5 comments1 min readLW link
(arxiv.org)

Dis­solv­ing the Fermi Para­dox, and what re­flec­tion it provides

Jan_Kulveit30 Jun 2018 16:35 UTC
28 points
22 comments1 min readLW link
(arxiv.org)

Mas­ter­ing Chess and Shogi by Self-Play with a Gen­eral Re­in­force­ment Learn­ing Algorithm

DragonGod6 Dec 2017 6:01 UTC
13 points
4 comments1 min readLW link
(arxiv.org)

Sum­mary: Sur­real Decisions

Chris_Leong27 Nov 2018 14:15 UTC
29 points
20 comments3 min readLW link

How Big a Deal are MatMul-Free Trans­form­ers?

JustisMills27 Jun 2024 22:28 UTC
19 points
6 comments5 min readLW link
(justismills.substack.com)

To Learn Crit­i­cal Think­ing, Study Crit­i­cal Thinking

gwern7 Jul 2012 23:50 UTC
41 points
16 comments11 min readLW link

Se­cret Col­lu­sion: Will We Know When to Un­plug AI?

16 Sep 2024 16:07 UTC
55 points
7 comments31 min readLW link

‘Chat with im­pact­ful re­search & eval­u­a­tions’ (Un­jour­nal Note­bookLMs)

david reinstein28 Sep 2024 0:32 UTC
6 points
0 comments2 min readLW link

[Question] Search­ing for Im­pos­si­bil­ity Re­sults or No-Go The­o­rems for prov­able safety.

Maelstrom27 Sep 2024 20:12 UTC
2 points
1 comment1 min readLW link

An Overview of Sparks of Ar­tifi­cial Gen­eral In­tel­li­gence: Early ex­per­i­ments with GPT-4

Annapurna27 Mar 2023 13:44 UTC
10 points
0 comments7 min readLW link
(jorgevelez.substack.com)

Paper di­ges­tion: “May We Have Your At­ten­tion Please? Hu­man-Rights NGOs and the Prob­lem of Global Com­mu­ni­ca­tion”

Klara Helene Nielsen20 Jul 2023 17:08 UTC
4 points
1 comment2 min readLW link
(journals.sagepub.com)

The Phys­iol­ogy of Willpower

pjeby18 Jun 2009 4:11 UTC
25 points
36 comments1 min readLW link

Ex­perts vs. parents

PhilGoetz29 Sep 2010 16:48 UTC
24 points
23 comments1 min readLW link

The Mind Is Not De­signed For Thinking

CronoDAS26 Mar 2009 21:57 UTC
9 points
7 comments1 min readLW link

[Link] Per­sis­tence of Long-Term Me­mory in Vitrified and Re­vived C. el­e­gans worms

Rangi24 May 2015 3:43 UTC
35 points
8 comments1 min readLW link

[Question] Can this model grade a test with­out know­ing the an­swers?

Elizabeth31 Aug 2019 0:53 UTC
20 points
3 comments1 min readLW link

Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence Align­ment Research

22 Aug 2019 10:33 UTC
24 points
3 comments13 min readLW link

Cita­bil­ity of Less­wrong and the Align­ment Forum

Leon Lang8 Jan 2023 22:12 UTC
48 points
2 comments1 min readLW link

Link: Writ­ing ex­er­cise closes the gen­der gap in uni­ver­sity-level physics

Vladimir_Golovin27 Nov 2010 16:28 UTC
27 points
9 comments1 min readLW link

Dono­hue, Le­vitt, Roe, and Wade: T-minus 20 years to a mas­sive crime wave?

Paul Logan3 Jul 2022 3:03 UTC
−24 points
6 comments3 min readLW link
(laulpogan.substack.com)

Over-encapsulation

PhilGoetz25 Mar 2010 17:58 UTC
29 points
56 comments3 min readLW link

FHI pa­per pub­lished in Science: in­ter­ven­tions against COVID-19

SoerenMind16 Dec 2020 21:19 UTC
119 points
0 comments3 min readLW link

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

23 Oct 2023 14:11 UTC
20 points
2 comments5 min readLW link
(far.ai)

NeurIPS ML Safety Work­shop 2022

Dan H26 Jul 2022 15:28 UTC
72 points
2 comments1 min readLW link
(neurips2022.mlsafety.org)

[Question] How can we se­cure more re­search po­si­tions at our uni­ver­si­ties for x-risk re­searchers?

Neil Crawford6 Sep 2022 17:17 UTC
11 points
0 comments1 min readLW link

That one apoc­a­lyp­tic nu­clear famine pa­per is bunk

Lao Mein12 Oct 2022 3:33 UTC
110 points
10 comments1 min readLW link

Self-Con­trol of LLM Be­hav­iors by Com­press­ing Suffix Gra­di­ent into Pre­fix Controller

Henry Cai16 Jun 2024 13:01 UTC
7 points
0 comments7 min readLW link
(arxiv.org)

Hope Function

gwern1 Jul 2012 15:40 UTC
38 points
8 comments1 min readLW link

Rawls’s Veil of Ig­no­rance Doesn’t Make Any Sense

Arjun Panickssery24 Feb 2024 13:18 UTC
10 points
9 comments1 min readLW link

How You Can Gain Self Con­trol Without “Self-Con­trol”

spencerg24 Mar 2021 23:38 UTC
109 points
41 comments23 min readLW link

Func­tional Trade-offs

weathersystems19 May 2021 1:06 UTC
5 points
0 comments6 min readLW link

“Are Ex­per­i­ments Pos­si­ble?” Seeds of Science call for reviewers

rogersbacon2 Nov 2022 20:05 UTC
8 points
0 comments1 min readLW link

Char­ac­ter­iz­ing In­trin­sic Com­po­si­tion­al­ity in Trans­form­ers with Tree Projections

Ulisse Mini13 Nov 2022 9:46 UTC
12 points
2 comments1 min readLW link
(arxiv.org)

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_Evans16 Sep 2021 10:09 UTC
58 points
24 comments6 min readLW link

Walk­through of the Tiling Agents for Self-Mod­ify­ing AI paper

So8res13 Dec 2013 3:23 UTC
29 points
18 comments21 min readLW link

Do­ing your good deed for the day

Scott Alexander27 Oct 2009 0:45 UTC
152 points
57 comments3 min readLW link

[linkpost] Ac­qui­si­tion of Chess Knowl­edge in AlphaZero

Quintin Pope23 Nov 2021 7:55 UTC
8 points
1 comment1 min readLW link

De­mand­ing and De­sign­ing Aligned Cog­ni­tive Architectures

Koen.Holtman21 Dec 2021 17:32 UTC
8 points
5 comments5 min readLW link

Even if you have a nail, not all ham­mers are the same

PhilGoetz29 Mar 2010 18:09 UTC
150 points
126 comments6 min readLW link

Less Com­pe­ti­tion, More Mer­i­toc­racy?

Zvi20 Jan 2019 2:00 UTC
85 points
19 comments20 min readLW link3 reviews
(thezvi.wordpress.com)

A New In­ter­pre­ta­tion of the Marsh­mal­low Test

elharo5 Jul 2013 12:22 UTC
119 points
25 comments2 min readLW link

Good News for Immunostimulants

sarahconstantin16 Apr 2018 16:10 UTC
26 points
9 comments2 min readLW link
(srconstantin.wordpress.com)

Let’s Read: Su­per­hu­man AI for mul­ti­player poker

Yuxi_Liu14 Jul 2019 6:22 UTC
56 points
6 comments8 min readLW link

Tiling Agents for Self-Mod­ify­ing AI (OPFAI #2)

Eliezer Yudkowsky6 Jun 2013 20:24 UTC
88 points
259 comments3 min readLW link

The Vuln­er­a­ble World Hy­poth­e­sis (by Bostrom)

Ben Pace6 Nov 2018 20:05 UTC
50 points
17 comments4 min readLW link
(nickbostrom.com)

Deep­Mind ar­ti­cle: AI Safety Gridworlds

scarcegreengrass30 Nov 2017 16:13 UTC
25 points
6 comments1 min readLW link
(deepmind.com)

Claims & As­sump­tions made in Eter­nity in Six Hours

Ruby8 May 2019 23:11 UTC
50 points
7 comments3 min readLW link

[1911.08265] Mas­ter­ing Atari, Go, Chess and Shogi by Plan­ning with a Learned Model | Arxiv

DragonGod21 Nov 2019 1:18 UTC
52 points
4 comments1 min readLW link
(arxiv.org)
No comments.