All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28 29 30

Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility

Orpheus16 and OliviaJ

Nov 22, 2022, 10:19 PM

74 points

20 comments4 min readLW link

ACX Zurich November Meetup

MBNov 22, 2022, 9:41 PM

1 point

0 comments1 min readLW link

Human-level Full-Press Diplomacy (some bare facts).

Cleo NardoNov 22, 2022, 8:59 PM

50 points

7 comments3 min readLW link

[Question] How does late-2022 COVID transmissibility drop over time?

Daniel DeweyNov 22, 2022, 7:54 PM

8 points

2 comments1 min readLW link

AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak and benedelman

Nov 22, 2022, 6:57 PM

134 points

97 comments24 min readLW link

Progress links and tweets, 2022-11-22

jasoncrawfordNov 22, 2022, 5:39 PM

17 points

0 comments1 min readLW link

(rootsofprogress.org)

Tyranny of the Epistemic Majority

Scott GarrabrantNov 22, 2022, 5:19 PM

192 points

13 comments9 min readLW link 1 review

A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2

Neel NandaNov 22, 2022, 5:12 PM

20 points

0 comments1 min readLW link

(www.youtube.com)

Simple Improvement to College Football Overtime Rules

ZviNov 22, 2022, 5:00 PM

10 points

0 comments1 min readLW link

(thezvi.wordpress.com)

Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)

Jacy Reese AnthisNov 22, 2022, 4:50 PM

93 points

64 comments1 min readLW link

(www.science.org)

Austin LW meetup notes: The FTX Affair

jchanNov 22, 2022, 2:01 PM

20 points

3 comments16 min readLW link

Motivated Cognition and the Multiverse of Truth

Q HomeNov 22, 2022, 12:51 PM

8 points

16 comments24 min readLW link

LessWrong readers are invited to apply to the Lurkshop

Jonas V and GradientDissenter

Nov 22, 2022, 9:19 AM

101 points

41 comments3 min readLW link

Gaoxing Guy

Alok SinghNov 22, 2022, 1:50 AM

3 points

1 comment1 min readLW link

(alok.github.io)

Miscellaneous First-Pass Alignment Thoughts

NickGabsNov 21, 2022, 9:23 PM

12 points

4 comments10 min readLW link

[Hebbian Natural Abstractions] Introduction

Samuel Nellessen and Jan

Nov 21, 2022, 8:34 PM

34 points

3 comments4 min readLW link

(www.snellessen.com)

Utilitarianism Meets Egalitarianism

Scott GarrabrantNov 21, 2022, 7:00 PM

121 points

16 comments6 min readLW link 1 review

Interview with Matt Freeman

EvenflairNov 21, 2022, 6:17 PM

15 points

0 comments1 min readLW link

(overcast.fm)

Here’s the exit.

ValentineNov 21, 2022, 6:07 PM

119 points

180 comments10 min readLW link 5 reviews

Benefits/Risks of Scott Aaronson’s Orthodox/Reform Framing for AI Alignment

JeremyyNov 21, 2022, 5:54 PM

2 points

1 comment LW link

[ASoT] Reflectivity in Narrow AI

Ulisse MiniNov 21, 2022, 12:51 AM

6 points

1 comment1 min readLW link

Scott Aaronson on “Reform AI Alignment”

ShmiNov 20, 2022, 10:20 PM

39 points

17 comments1 min readLW link

(scottaaronson.blog)

On Morality, Ethics, and all that Jazz

Delen HeismanNov 20, 2022, 8:00 PM

4 points

4 comments2 min readLW link

(delen.substack.com)

Limits to the Controllability of AGI

Roman_Yampolskiy, Remmelt Ellen and Karl von Wendt

Nov 20, 2022, 7:18 PM

10 points

2 comments9 min readLW link

Career Scouting: Dentistry

koratkarNov 20, 2022, 3:55 PM

69 points

5 comments5 min readLW link

(careerscouting.substack.com)

Decision Theory but also Ghosts

eva_Nov 20, 2022, 1:24 PM

17 points

21 comments10 min readLW link

ARC paper: Formalizing the presumption of independence

Erik JennerNov 20, 2022, 1:22 AM

97 points

2 comments2 min readLW link

(arxiv.org)

Update to Mysteries of mode collapse: text-davinci-002 not RLHF

janusNov 19, 2022, 11:51 PM

71 points

8 comments2 min readLW link

Make the Drought Evaporate!

AnthonyRepettoNov 19, 2022, 11:41 PM

32 points

25 comments3 min readLW link

Elastic Productivity Tools

Simon BerensNov 19, 2022, 9:59 PM

76 points

8 comments2 min readLW link

(simonberens.me)

A Short Dialogue on the Meaning of Reward Functions

Leon Lang, Quintin Pope and peligrietzer

Nov 19, 2022, 9:04 PM

45 points

0 comments3 min readLW link

By Default, GPTs Think In Plain Sight

Fabien RogerNov 19, 2022, 7:15 PM

88 points

36 comments9 min readLW link

Review: Bayesian Statistics the Fun Way by Will Kurt

mattoNov 19, 2022, 6:52 PM

4 points

2 comments2 min readLW link

[Question] How does acausal trade work in a deterministic multiverse?

sisyphusNov 19, 2022, 1:50 AM

2 points

13 comments1 min readLW link

Choosing the right dish

Adam ZernerNov 19, 2022, 1:38 AM

38 points

7 comments8 min readLW link

Reflective Consequentialism

Adam ZernerNov 18, 2022, 11:56 PM

21 points

14 comments4 min readLW link

Value Created vs. Value Extracted

SableNov 18, 2022, 9:34 PM

8 points

6 comments6 min readLW link

(affablyevil.substack.com)

The Disastrously Confident And Inaccurate AI

Sharat Jacob JacobNov 18, 2022, 7:06 PM

13 points

0 comments13 min readLW link

How AI Fails Us: A non-technical view of the Alignment Problem

testingthewatersNov 18, 2022, 7:02 PM

7 points

1 comment2 min readLW link

(ethics.harvard.edu)

[Question] Is there any policy for a fair treatment of AIs whose friendliness is in doubt?

nahojNov 18, 2022, 7:01 PM

15 points

10 comments1 min readLW link

Distillation of “How Likely Is Deceptive Alignment?”

NickGabsNov 18, 2022, 4:31 PM

24 points

4 comments10 min readLW link

Contra Chords

jefftkNov 18, 2022, 4:20 PM

12 points

1 comment7 min readLW link

(www.jefftk.com)

[Question] Updates on scaling laws for foundation models from ′ Transcending Scaling Laws with 0.1% Extra Compute’

Nick_Greig18 Nov 2022 12:46 UTC

15 points

2 comments1 min readLW link

Halifax, NS – Monthly Rationalist, EA, and ACX Meetup

Ideopunk18 Nov 2022 11:45 UTC

10 points

0 comments1 min readLW link

Introducing The Logical Foundation, A Plan to End Poverty With Guaranteed Income

Michael Simm18 Nov 2022 8:13 UTC

9 points

23 comments LW link

My Deontology Says Narrow-Mindedness is Always Wrong

LVSN18 Nov 2022 6:11 UTC

6 points

2 comments1 min readLW link

AI Ethics != Ai Safety

Dentin18 Nov 2022 3:02 UTC

2 points

0 comments1 min readLW link

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

18 Nov 2022 1:48 UTC

72 points

64 comments12 min readLW link

Engineering Monosemanticity in Toy Models

Adam Jermyn, evhub and Nicholas Schiefer

18 Nov 2022 1:43 UTC

75 points

7 comments3 min readLW link

(arxiv.org)

AGIs may value intrinsic rewards more than extrinsic ones

catubc17 Nov 2022 21:49 UTC

8 points

6 comments4 min readLW link