All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30 31

Conditioning Generative Models for Alignment

JozdienJul 18, 2022, 7:11 AM

60 points

8 comments20 min readLW link

Training goals for large language models

Johannes TreutleinJul 18, 2022, 7:09 AM

28 points

5 comments19 min readLW link

A distillation of Evan Hubinger’s training stories (for SERI MATS)

Daphne_WJul 18, 2022, 3:38 AM

15 points

1 comment10 min readLW link

Forecasting ML Benchmarks in 2023

jsteinhardtJul 18, 2022, 2:50 AM

36 points

20 comments12 min readLW link

(bounded-regret.ghost.io)

What should you change in response to an “emergency”? And AI risk

AnnaSalamonJul 18, 2022, 1:11 AM

339 points

60 comments6 min readLW link 1 review

Deception?! I ain’t got time for that!

Paul CologneseJul 18, 2022, 12:06 AM

55 points

5 comments13 min readLW link

How Interpretability can be Impactful

Connall GarrodJul 18, 2022, 12:06 AM

18 points

0 comments37 min readLW link

Why you might expect homogeneous take-off: evidence from ML research

Andrei AlexandruJul 17, 2022, 8:31 PM

24 points

0 comments10 min readLW link

Examples of AI Increasing AI Progress

TW123Jul 17, 2022, 8:06 PM

107 points

14 comments1 min readLW link

Four questions I ask AI safety researchers

Orpheus16Jul 17, 2022, 5:25 PM

17 points

0 comments1 min readLW link

Why I Think Abrupt AI Takeoff

lincolnquirkJul 17, 2022, 5:04 PM

14 points

6 comments1 min readLW link

Culture wars in riddle format

MalmesburyJul 17, 2022, 2:51 PM

7 points

28 comments3 min readLW link

Bangalore LW/ACX Meetup in person

VyakartJul 17, 2022, 6:53 AM

1 point

0 comments1 min readLW link

Resolve Cycles

CFAR!DuncanJul 16, 2022, 11:17 PM

140 points

8 comments10 min readLW link

Alignment as Game Design

Shoshannah TekofskyJul 16, 2022, 10:36 PM

11 points

7 comments2 min readLW link

Risk Management from a Climbers Perspective

AnnapurnaJul 16, 2022, 9:14 PM

5 points

0 comments6 min readLW link

(jorgevelez.substack.com)

Cognitive Instability, Physicalism, and Free Will

dadadarrenJul 16, 2022, 1:13 PM

5 points

27 comments2 min readLW link

(www.sleepingbeautyproblem.com)

All AGI safety questions welcome (especially basic ones) [July 2022]

plex and Robert Miles

Jul 16, 2022, 12:57 PM

84 points

132 comments3 min readLW link

QNR Prospects

PeterMcCluskeyJul 16, 2022, 2:03 AM

40 points

3 comments8 min readLW link

(www.bayesianinvestor.com)

To-do waves

Paweł SysiakJul 16, 2022, 1:19 AM

3 points

0 comments3 min readLW link

Moneypumping Bryan Caplan’s Belief in Free Will

MorpheusJul 16, 2022, 12:46 AM

5 points

9 comments1 min readLW link

A summary of every “Highlights from the Sequences” post

Orpheus16Jul 15, 2022, 11:01 PM

98 points

7 comments17 min readLW link

Safety Implications of LeCun’s path to machine intelligence

Ivan VendrovJul 15, 2022, 9:47 PM

102 points

18 comments6 min readLW link

Comfort Zone Exploration

CFAR!DuncanJul 15, 2022, 9:18 PM

51 points

2 comments12 min readLW link

A time-invariant version of Laplace’s rule

Jsevillamol and Ege Erdil

Jul 15, 2022, 7:28 PM

72 points

13 comments17 min readLW link

(epochai.org)

An attempt to break circularity in science

fryolysisJul 15, 2022, 6:32 PM

3 points

5 comments1 min readLW link

A story about a duplicitous API

LiLiLiJul 15, 2022, 6:26 PM

2 points

0 comments1 min readLW link

Highlights from the memoirs of Vannevar Bush

jasoncrawfordJul 15, 2022, 6:08 PM

11 points

0 comments13 min readLW link

(rootsofprogress.org)

Notes on Learning the Prior

carboniferous_umbraculum Jul 15, 2022, 5:28 PM

25 points

2 comments25 min readLW link

Review of The Engines of Cognition

William GasarchJul 15, 2022, 2:13 PM

14 points

5 comments15 min readLW link

A review of Nate Hilger’s The Parent Trap

David Hugh-JonesJul 15, 2022, 9:30 AM

15 points

8 comments4 min readLW link

(wyclif.substack.com)

Musings on the Human Objective Function

Michael SoareverixJul 15, 2022, 7:13 AM

3 points

0 comments3 min readLW link

Peter Singer’s first published piece on AI

FaiJul 15, 2022, 6:18 AM

20 points

5 comments1 min readLW link

(link.springer.com)

Don’t use ‘infohazard’ for collectively destructive info

Eliezer YudkowskyJul 15, 2022, 5:13 AM

86 points

33 comments1 min readLW link 2 reviews

(www.facebook.com)

Upcoming heatwave: advice

stavrosJul 15, 2022, 5:03 AM

16 points

13 comments3 min readLW link

A note about differential technological development

So8resJul 15, 2022, 4:46 AM

197 points

33 comments6 min readLW link

Inward and outward steelmanning

Q HomeJul 14, 2022, 11:32 PM

13 points

6 comments18 min readLW link

Potato diet: A post mortem and an answer to SMTM’s article

Épiphanie GédéonJul 14, 2022, 11:18 PM

48 points

34 comments16 min readLW link

Proposed Orthogonality Theses #2-5

rjbgJul 14, 2022, 10:59 PM

8 points

0 comments2 min readLW link

Better Quiddler

jefftkJul 14, 2022, 5:40 PM

17 points

0 comments1 min readLW link

(www.jefftk.com)

Circumventing interpretability: How to defeat mind-readers

Lee SharkeyJul 14, 2022, 4:59 PM

114 points

15 comments33 min readLW link

Covid 7/14/22: BA.2.75 Plus Tax

ZviJul 14, 2022, 2:40 PM

39 points

9 comments8 min readLW link

(thezvi.wordpress.com)

Criticism of EA Criticism Contest

ZviJul 14, 2022, 2:30 PM

108 points

17 comments31 min readLW link 1 review

(thezvi.wordpress.com)

Humans provide an untapped wealth of evidence about alignment

TurnTrout and Quintin Pope

14 Jul 2022 2:31 UTC

212 points

94 comments9 min readLW link 1 review

[Question] Wacky, risky, anti-inductive intelligence-enhancement methods?

Nicholas / Heather Kross14 Jul 2022 1:40 UTC

20 points

30 comments1 min readLW link

[Question] How to impress students with recent advances in ML?

Charbel-Raphaël14 Jul 2022 0:03 UTC

12 points

2 comments1 min readLW link

Notes on Love

David Gross13 Jul 2022 23:35 UTC

18 points

3 comments29 min readLW link

Deep learning curriculum for large language model alignment

Jacob_Hilton13 Jul 2022 21:58 UTC

57 points

3 comments1 min readLW link

(github.com)

Artificial Sandwiching: When can we test scalable alignment protocols without humans?

Sam Bowman13 Jul 2022 21:14 UTC

42 points

6 comments5 min readLW link

[Question] Any tips for eliciting one’s own latent knowledge?

MSRayne13 Jul 2022 21:12 UTC

16 points

20 comments2 min readLW link