All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Aversion Factoring

CFAR!DuncanJul 7, 2022, 4:09 PM

79 points

1 comment8 min readLW link

Abstracting The Hardness of Alignment: Unbounded Atomic Optimization

adamShimiJul 29, 2022, 6:59 PM

75 points

3 comments16 min readLW link

Which values are stable under ontology shifts?

Richard_NgoJul 23, 2022, 2:40 AM

75 points

48 comments3 min readLW link

(thinkingcomplete.blogspot.com)

A Pattern Language For Rationality

VaniverJul 5, 2022, 7:08 PM

75 points

14 comments15 min readLW link

Principles of Privacy for Alignment Research

johnswentworthJul 27, 2022, 7:53 PM

73 points

31 comments7 min readLW link

NeurIPS ML Safety Workshop 2022

Dan HJul 26, 2022, 3:28 PM

72 points

2 comments1 min readLW link

(neurips2022.mlsafety.org)

A time-invariant version of Laplace’s rule

Jsevillamol and Ege Erdil

Jul 15, 2022, 7:28 PM

72 points

13 comments17 min readLW link

(epochai.org)

Cognitive Risks of Adolescent Binge Drinking

Elizabeth and Martin Bernstorff

Jul 20, 2022, 9:10 PM

70 points

12 comments10 min readLW link

(acesounderglass.com)

Avoid the abbreviation “FLOPs” – use “FLOP” or “FLOP/s” instead

Daniel_EthJul 10, 2022, 10:44 AM

70 points

13 comments1 min readLW link

Taste & Shaping

CFAR!DuncanJul 10, 2022, 5:50 AM

67 points

1 comment16 min readLW link

My vision of a good future, part I

Jeffrey LadishJul 6, 2022, 1:23 AM

66 points

18 comments9 min readLW link

Curating “The Epistemic Sequences” (list v.0.1)

Andrew_CritchJul 23, 2022, 10:17 PM

65 points

12 comments7 min readLW link

Applications are open for CFAR workshops in Prague this fall!

John SteidleyJul 19, 2022, 6:29 PM

64 points

3 comments2 min readLW link

What’s next for instrumental rationality?

Andrew_CritchJul 23, 2022, 10:55 PM

63 points

7 comments1 min readLW link

Introducing the Fund for Alignment Research (We’re Hiring!)

AdamGleave, Scott Emmons, Ethan Perez and Claudia Shi

Jul 6, 2022, 2:07 AM

62 points

0 comments4 min readLW link

Response to Blake Richards: AGI, generality, alignment, & loss functions

Steven ByrnesJul 12, 2022, 1:56 PM

62 points

9 comments15 min readLW link

Double Crux

CFAR!DuncanJul 24, 2022, 6:34 AM

61 points

9 comments11 min readLW link

My Most Likely Reason to Die Young is AI X-Risk

AISafetyIsNotLongtermistJul 4, 2022, 5:08 PM

61 points

24 comments4 min readLW link

(forum.effectivealtruism.org)

Conditioning Generative Models for Alignment

JozdienJul 18, 2022, 7:11 AM

60 points

8 comments20 min readLW link

When Giving People Money Doesn’t Help

ZviJul 7, 2022, 1:00 PM

58 points

12 comments10 min readLW link

(thezvi.wordpress.com)

A Bias Against Altruism

Lone PineJul 23, 2022, 8:44 PM

58 points

30 comments2 min readLW link

The Reader’s Guide to Optimal Monetary Policy

Ege ErdilJul 25, 2022, 3:10 PM

57 points

10 comments14 min readLW link

Deep learning curriculum for large language model alignment

Jacob_HiltonJul 13, 2022, 9:58 PM

57 points

3 comments1 min readLW link

(github.com)

Deception?! I ain’t got time for that!

Paul CologneseJul 18, 2022, 12:06 AM

55 points

5 comments13 min readLW link

[AN #172] Sorry for the long hiatus!

Rohin ShahJul 5, 2022, 6:20 AM

54 points

0 comments3 min readLW link

(mailchi.mp)

Don’t take the organizational chart literally

lcJul 21, 2022, 12:56 AM

54 points

21 comments4 min readLW link

Procedural Executive Function, Part 1

DaystarEldJul 4, 2022, 6:51 PM

52 points

8 comments14 min readLW link

(daystareld.com)

Comfort Zone Exploration

CFAR!DuncanJul 15, 2022, 9:18 PM

51 points

2 comments12 min readLW link

Outer vs inner misalignment: three framings

Richard_NgoJul 6, 2022, 7:46 PM

51 points

5 comments9 min readLW link

Race Along Rashomon Ridge

Stephen Fowler, Peter S. Park and MichaelEinhorn

Jul 7, 2022, 3:20 AM

50 points

15 comments8 min readLW link

Making decisions using multiple worldviews

Richard_NgoJul 13, 2022, 7:15 PM

50 points

10 comments11 min readLW link

Acceptability Verification: A Research Agenda

David Udell and evhub

Jul 12, 2022, 8:11 PM

50 points

0 comments1 min readLW link

(docs.google.com)

Report from a civilizational observer on Earth

owencbJul 9, 2022, 5:26 PM

49 points

12 comments6 min readLW link

Announcing the AI Safety Field Building Hub, a new effort to provide AISFB projects, mentorship, and funding

Vael GatesJul 28, 2022, 9:29 PM

49 points

3 comments6 min readLW link

Potato diet: A post mortem and an answer to SMTM’s article

Épiphanie GédéonJul 14, 2022, 11:18 PM

48 points

34 comments16 min readLW link

The Alignment Problem

lsusrJul 11, 2022, 3:03 AM

46 points

18 comments3 min readLW link

Babysitting as Parenting Trial?

jefftkJul 7, 2022, 1:20 PM

46 points

19 comments3 min readLW link

(www.jefftk.com)

The Most Important Century: The Animation

Writer and Matthew Barnett

Jul 24, 2022, 8:58 PM

46 points

2 comments20 min readLW link

(youtu.be)

Deontological Evil

lsusrJul 2, 2022, 6:57 AM

45 points

4 comments2 min readLW link

Eavesdropping on Aliens: A Data Decoding Challenge

anonymousaisafetyJul 24, 2022, 4:35 AM

44 points

9 comments4 min readLW link

Tarnished Guy who Puts a Num on it

Jacob FalkovichJul 6, 2022, 6:05 PM

44 points

11 comments4 min readLW link

Goal Alignment Is Robust To the Sharp Left Turn

Thane RuthenisJul 13, 2022, 8:23 PM

43 points

16 comments4 min readLW link

Bucket Errors

CFAR!DuncanJul 29, 2022, 6:50 PM

43 points

7 comments11 min readLW link

Systemization

CFAR!DuncanJul 11, 2022, 6:39 PM

42 points

5 comments12 min readLW link

Safety considerations for online generative modeling

Sam MarksJul 7, 2022, 6:31 PM

42 points

9 comments14 min readLW link

Artificial Sandwiching: When can we test scalable alignment protocols without humans?

Sam BowmanJul 13, 2022, 9:14 PM

42 points

6 comments5 min readLW link

Meiosis is all you need

MetacelsusJul 1, 2022, 7:39 AM

41 points

3 comments2 min readLW link

(denovo.substack.com)

The curious case of Pretty Good human inner/outer alignment

PavleMihaJul 5, 2022, 7:04 PM

41 points

45 comments4 min readLW link

QNR Prospects

PeterMcCluskeyJul 16, 2022, 2:03 AM

40 points

3 comments8 min readLW link

(www.bayesianinvestor.com)

[Linkpost] Existential Risk Analysis in Empirical Research Papers

Dan HJul 2, 2022, 12:09 AM

40 points

0 comments1 min readLW link

(arxiv.org)