All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242526 27 28 29 30 31

AI Safety Cheatsheet / Quick Reference

Zohar JacksonJul 20, 2022, 9:39 AM

3 points

0 comments1 min readLW link

(github.com)

Getting Unstuck on Counterfactuals

Chris_LeongJul 20, 2022, 5:31 AM

7 points

1 comment2 min readLW link

Pitfalls with Proofs

scasperJul 19, 2022, 10:21 PM

19 points

21 comments8 min readLW link

A daily routine I do for my AI safety research work

scasperJul 19, 2022, 9:58 PM

22 points

7 comments1 min readLW link

Progress links and tweets, 2022-07-19

jasoncrawfordJul 19, 2022, 8:50 PM

11 points

1 comment1 min readLW link

(rootsofprogress.org)

Applications are open for CFAR workshops in Prague this fall!

John SteidleyJul 19, 2022, 6:29 PM

64 points

3 comments2 min readLW link

Sexual Abuse attitudes might be infohazardous

Pseudonymous OtterJul 19, 2022, 6:06 PM

256 points

72 comments1 min readLW link

Spending Update 2022

jefftkJul 19, 2022, 2:10 PM

28 points

0 comments3 min readLW link

(www.jefftk.com)

Abram Demski’s ELK thoughts and proposal—distillation

Rubi J. HudsonJul 19, 2022, 6:57 AM

19 points

8 comments16 min readLW link

Bounded complexity of solving ELK and its implications

Rubi J. HudsonJul 19, 2022, 6:56 AM

11 points

4 comments18 min readLW link

Help ARC evaluate capabilities of current language models (still need people)

Beth BarnesJul 19, 2022, 4:55 AM

95 points

6 comments2 min readLW link

A Critique of AI Alignment Pessimism

ExCephJul 19, 2022, 2:28 AM

9 points

1 comment9 min readLW link

Ars D&D.Sci: Mysteries of Mana Evaluation & Ruleset

aphyerJul 19, 2022, 2:06 AM

33 points

4 comments5 min readLW link

Marburg Virus Pandemic Prediction Checklist

DirectedEvolutionJul 18, 2022, 11:15 PM

30 points

0 comments5 min readLW link

At what point will we know if Eliezer’s predictions are right or wrong?

anonymous123456Jul 18, 2022, 10:06 PM

5 points

6 comments1 min readLW link

Modelling Deception

Garrett BakerJul 18, 2022, 9:21 PM

15 points

0 comments7 min readLW link

Are Intelligence and Generality Orthogonal?

cubefoxJul 18, 2022, 8:07 PM

18 points

16 comments1 min readLW link

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya CotraJul 18, 2022, 7:06 PM

368 points

95 comments75 min readLW link 1 review

Turning Some Inconsistent Preferences into Consistent Ones

niplavJul 18, 2022, 6:40 PM

23 points

5 comments12 min readLW link

Addendum: A non-magical explanation of Jeffrey Epstein

lcJul 18, 2022, 5:40 PM

81 points

21 comments11 min readLW link

Launching a new progress institute, seeking a CEO

jasoncrawfordJul 18, 2022, 4:58 PM

25 points

2 comments3 min readLW link

(rootsofprogress.org)

Machine Learning Model Sizes and the Parameter Gap [abridged]

Pablo VillalobosJul 18, 2022, 4:51 PM

20 points

0 comments1 min readLW link

(epochai.org)

Quantilizers and Generative Models

Adam JermynJul 18, 2022, 4:32 PM

24 points

5 comments4 min readLW link

AI Hiroshima (Does A Vivid Example Of Destruction Forestall Apocalypse?)

SableJul 18, 2022, 12:06 PM

4 points

4 comments2 min readLW link

How the ---- did Feynman Get Here !?

George3d6Jul 18, 2022, 9:43 AM

8 points

8 comments3 min readLW link

(www.epistem.ink)

Conditioning Generative Models for Alignment

JozdienJul 18, 2022, 7:11 AM

60 points

8 comments20 min readLW link

Training goals for large language models

Johannes TreutleinJul 18, 2022, 7:09 AM

28 points

5 comments19 min readLW link

A distillation of Evan Hubinger’s training stories (for SERI MATS)

Daphne_WJul 18, 2022, 3:38 AM

15 points

1 comment10 min readLW link

Forecasting ML Benchmarks in 2023

jsteinhardtJul 18, 2022, 2:50 AM

36 points

20 comments12 min readLW link

(bounded-regret.ghost.io)

What should you change in response to an “emergency”? And AI risk

AnnaSalamonJul 18, 2022, 1:11 AM

339 points

60 comments6 min readLW link 1 review

Deception?! I ain’t got time for that!

Paul CologneseJul 18, 2022, 12:06 AM

55 points

5 comments13 min readLW link

How Interpretability can be Impactful

Connall GarrodJul 18, 2022, 12:06 AM

18 points

0 comments37 min readLW link

Why you might expect homogeneous take-off: evidence from ML research

Andrei AlexandruJul 17, 2022, 8:31 PM

24 points

0 comments10 min readLW link

Examples of AI Increasing AI Progress

TW123Jul 17, 2022, 8:06 PM

107 points

14 comments1 min readLW link

Four questions I ask AI safety researchers

Orpheus16Jul 17, 2022, 5:25 PM

17 points

0 comments1 min readLW link

Why I Think Abrupt AI Takeoff

lincolnquirkJul 17, 2022, 5:04 PM

14 points

6 comments1 min readLW link

Culture wars in riddle format

MalmesburyJul 17, 2022, 2:51 PM

7 points

28 comments3 min readLW link

Bangalore LW/ACX Meetup in person

VyakartJul 17, 2022, 6:53 AM

1 point

0 comments1 min readLW link

Resolve Cycles

CFAR!DuncanJul 16, 2022, 11:17 PM

140 points

8 comments10 min readLW link

Alignment as Game Design

Shoshannah TekofskyJul 16, 2022, 10:36 PM

11 points

7 comments2 min readLW link

Risk Management from a Climbers Perspective

AnnapurnaJul 16, 2022, 9:14 PM

5 points

0 comments6 min readLW link

(jorgevelez.substack.com)

Cognitive Instability, Physicalism, and Free Will

dadadarrenJul 16, 2022, 1:13 PM

5 points

27 comments2 min readLW link

(www.sleepingbeautyproblem.com)

All AGI safety questions welcome (especially basic ones) [July 2022]

plex and Robert Miles

Jul 16, 2022, 12:57 PM

84 points

132 comments3 min readLW link

QNR Prospects

PeterMcCluskey16 Jul 2022 2:03 UTC

40 points

3 comments8 min readLW link

(www.bayesianinvestor.com)

To-do waves

Paweł Sysiak16 Jul 2022 1:19 UTC

3 points

0 comments3 min readLW link

Moneypumping Bryan Caplan’s Belief in Free Will

Morpheus16 Jul 2022 0:46 UTC

5 points

9 comments1 min readLW link

A summary of every “Highlights from the Sequences” post

Orpheus1615 Jul 2022 23:01 UTC

98 points

7 comments17 min readLW link

Safety Implications of LeCun’s path to machine intelligence

Ivan Vendrov15 Jul 2022 21:47 UTC

102 points

18 comments6 min readLW link

Comfort Zone Exploration

CFAR!Duncan15 Jul 2022 21:18 UTC

51 points

2 comments12 min readLW link

A time-invariant version of Laplace’s rule

Jsevillamol and Ege Erdil

15 Jul 2022 19:28 UTC

72 points

13 comments17 min readLW link

(epochai.org)