All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Takeaways from a survey on AI alignment resources

DanielFilanNov 5, 2022, 11:40 PM

73 points

10 comments6 min readLW link 1 review

(danielfilan.com)

Distinguishing test from training

So8resNov 29, 2022, 9:41 PM

72 points

11 comments6 min readLW link

My take on Jacob Cannell’s take on AGI safety

Steven ByrnesNov 28, 2022, 2:01 PM

72 points

15 comments30 min readLW link 1 review

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

Nov 18, 2022, 1:48 AM

72 points

64 comments12 min readLW link

Update to Mysteries of mode collapse: text-davinci-002 not RLHF

janusNov 19, 2022, 11:51 PM

71 points

8 comments2 min readLW link

Career Scouting: Dentistry

koratkarNov 20, 2022, 3:55 PM

69 points

5 comments5 min readLW link

(careerscouting.substack.com)

Why Would AI “Aim” To Defeat Humanity?

HoldenKarnofskyNov 29, 2022, 7:30 PM

69 points

10 comments33 min readLW link

(www.cold-takes.com)

Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Neel NandaNov 1, 2022, 11:56 PM

69 points

16 comments1 min readLW link

(youtu.be)

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Robert MilesNov 1, 2022, 11:23 PM

68 points

105 comments2 min readLW link

Deontology and virtue ethics as “effective theories” of consequentialist ethics

Jan_KulveitNov 17, 2022, 2:11 PM

68 points

9 comments LW link 1 review

2022 LessWrong Census?

SurfingOrcaNov 7, 2022, 5:16 AM

67 points

13 comments1 min readLW link

The First Filter

adamShimi and Gabriel Alfour

Nov 26, 2022, 7:37 PM

67 points

5 comments1 min readLW link

Against “Classic Style”

Cleo NardoNov 23, 2022, 10:10 PM

67 points

30 comments4 min readLW link

Clarifying wireheading terminology

leogaoNov 24, 2022, 4:53 AM

66 points

6 comments1 min readLW link

Alignment allows “nonrobust” decision-influences and doesn’t require robust grading

TurnTroutNov 29, 2022, 6:23 AM

62 points

41 comments15 min readLW link

Announcing AI safety Mentors and Mentees

Marius HobbhahnNov 23, 2022, 3:21 PM

62 points

7 comments10 min readLW link

Against a General Factor of Doom

Jeffrey HeningerNov 23, 2022, 4:50 PM

61 points

19 comments4 min readLW link 1 review

(aiimpacts.org)

Could a single alien message destroy us?

Writer and Matthew Barnett

Nov 25, 2022, 7:32 AM

61 points

23 comments6 min readLW link

(youtu.be)

FTX will probably be sold at a steep discount. What we know and some forecasts on what will happen next

Nathan YoungNov 9, 2022, 2:14 AM

60 points

21 comments LW link

The Least Controversial Application of Geometric Rationality

Scott GarrabrantNov 25, 2022, 4:50 PM

60 points

22 comments4 min readLW link

New Frontiers in Mojibake

Adam ScherlisNov 26, 2022, 2:37 AM

60 points

7 comments6 min readLW link 1 review

(adam.scherlis.com)

What’s the Deal with Elon Musk and Twitter?

ZviNov 7, 2022, 1:50 PM

60 points

13 comments31 min readLW link

(thezvi.wordpress.com)

Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide

Andrew_CritchNov 24, 2022, 9:16 PM

58 points

35 comments3 min readLW link 1 review

Humans do acausal coordination all the time

Adam JermynNov 2, 2022, 2:40 PM

57 points

35 comments3 min readLW link

Some advice on independent research

Marius HobbhahnNov 8, 2022, 2:46 PM

56 points

5 comments10 min readLW link

A philosopher’s critique of RLHF

TW123Nov 7, 2022, 2:42 AM

55 points

8 comments2 min readLW link

Human-level Diplomacy was my fire alarm

Lao MeinNov 23, 2022, 10:05 AM

54 points

15 comments3 min readLW link

Announcing Nonlinear Emergency Funding

KatWoodsNov 13, 2022, 7:02 PM

54 points

0 comments LW link

Kelsey Piper’s recent interview of SBF

agucovaNov 16, 2022, 8:30 PM

51 points

29 comments LW link

Human-level Full-Press Diplomacy (some bare facts).

Cleo NardoNov 22, 2022, 8:59 PM

50 points

7 comments3 min readLW link

Noting an unsubstantiated communal belief about the FTX disaster

YitzNov 13, 2022, 5:37 AM

50 points

52 comments LW link

What’s the Alternative to Independence?

jefftkNov 13, 2022, 3:30 PM

50 points

3 comments1 min readLW link

(www.jefftk.com)

“Rudeness”, a useful coordination mechanic

RaemonNov 11, 2022, 10:27 PM

49 points

20 comments2 min readLW link

Developer experience for the motivation

Adam ZernerNov 16, 2022, 7:12 AM

49 points

7 comments4 min readLW link

Don’t align agents to evaluations of plans

TurnTroutNov 26, 2022, 9:16 PM

48 points

49 comments18 min readLW link

Information Markets

eva_Nov 2, 2022, 1:24 AM

46 points

6 comments12 min readLW link

A Mystery About High Dimensional Concept Encoding

Fabien RogerNov 3, 2022, 5:05 PM

46 points

13 comments7 min readLW link

A Short Dialogue on the Meaning of Reward Functions

Leon Lang, Quintin Pope and peligrietzer

Nov 19, 2022, 9:04 PM

45 points

0 comments3 min readLW link

For ELK truth is mostly a distraction

c.troutNov 4, 2022, 9:14 PM

44 points

0 comments21 min readLW link

The FTX Saga—Simplified

AnnapurnaNov 16, 2022, 2:42 AM

44 points

10 comments7 min readLW link

(jorgevelez.substack.com)

Spectrum of Independence

jefftkNov 5, 2022, 2:40 AM

43 points

7 comments1 min readLW link

(www.jefftk.com)

Rationalist Town Hall: FTX Fallout Edition (RSVP Required)

Ben PaceNov 23, 2022, 1:38 AM

43 points

13 comments2 min readLW link

The biological function of love for non-kin is to gain the trust of people we cannot deceive

chaosmageNov 7, 2022, 8:26 PM

43 points

3 comments8 min readLW link

The optimal angle for a solar boiler is different than for a solar panel

Yair HalberstadtNov 10, 2022, 10:32 AM

42 points

4 comments2 min readLW link

We must be very clear: fraud in the service of effective altruism is unacceptable

evhubNov 10, 2022, 11:31 PM

42 points

56 comments LW link

Weekly Roundup #4

ZviNov 4, 2022, 3:00 PM

42 points

1 comment6 min readLW link

(thezvi.wordpress.com)

A newcomer’s guide to the technical AI safety field

zeshenNov 4, 2022, 2:29 PM

42 points

3 comments10 min readLW link

Why square errors?

AprillionNov 26, 2022, 1:40 PM

41 points

11 comments2 min readLW link

Counterfactability

Scott GarrabrantNov 7, 2022, 5:39 AM

40 points

5 comments11 min readLW link

Scott Aaronson on “Reform AI Alignment”

ShmiNov 20, 2022, 10:20 PM

39 points

17 comments1 min readLW link

(scottaaronson.blog)