All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 23 24 25 26 27 28 29 30

Update to Mysteries of mode collapse: text-davinci-002 not RLHF

janusNov 19, 2022, 11:51 PM

71 points

8 comments2 min readLW link

Make the Drought Evaporate!

AnthonyRepettoNov 19, 2022, 11:41 PM

32 points

25 comments3 min readLW link

Elastic Productivity Tools

Simon BerensNov 19, 2022, 9:59 PM

76 points

8 comments2 min readLW link

(simonberens.me)

A Short Dialogue on the Meaning of Reward Functions

Leon Lang, Quintin Pope and peligrietzer

Nov 19, 2022, 9:04 PM

45 points

0 comments3 min readLW link

By Default, GPTs Think In Plain Sight

Fabien RogerNov 19, 2022, 7:15 PM

88 points

36 comments9 min readLW link

Review: Bayesian Statistics the Fun Way by Will Kurt

mattoNov 19, 2022, 6:52 PM

4 points

2 comments2 min readLW link

[Question] How does acausal trade work in a deterministic multiverse?

sisyphusNov 19, 2022, 1:50 AM

2 points

13 comments1 min readLW link

Choosing the right dish

Adam ZernerNov 19, 2022, 1:38 AM

38 points

7 comments8 min readLW link

Reflective Consequentialism

Adam ZernerNov 18, 2022, 11:56 PM

21 points

14 comments4 min readLW link

Value Created vs. Value Extracted

SableNov 18, 2022, 9:34 PM

8 points

6 comments6 min readLW link

(affablyevil.substack.com)

The Disastrously Confident And Inaccurate AI

Sharat Jacob JacobNov 18, 2022, 7:06 PM

13 points

0 comments13 min readLW link

How AI Fails Us: A non-technical view of the Alignment Problem

testingthewatersNov 18, 2022, 7:02 PM

7 points

1 comment2 min readLW link

(ethics.harvard.edu)

[Question] Is there any policy for a fair treatment of AIs whose friendliness is in doubt?

nahojNov 18, 2022, 7:01 PM

15 points

10 comments1 min readLW link

Distillation of “How Likely Is Deceptive Alignment?”

NickGabsNov 18, 2022, 4:31 PM

24 points

4 comments10 min readLW link

Contra Chords

jefftkNov 18, 2022, 4:20 PM

12 points

1 comment7 min readLW link

(www.jefftk.com)

[Question] Updates on scaling laws for foundation models from ′ Transcending Scaling Laws with 0.1% Extra Compute’

Nick_GreigNov 18, 2022, 12:46 PM

15 points

2 comments1 min readLW link

Halifax, NS – Monthly Rationalist, EA, and ACX Meetup

IdeopunkNov 18, 2022, 11:45 AM

10 points

0 comments1 min readLW link

Introducing The Logical Foundation, A Plan to End Poverty With Guaranteed Income

Michael SimmNov 18, 2022, 8:13 AM

9 points

23 comments LW link

My Deontology Says Narrow-Mindedness is Always Wrong

LVSNNov 18, 2022, 6:11 AM

6 points

2 comments1 min readLW link

AI Ethics != Ai Safety

DentinNov 18, 2022, 3:02 AM

2 points

0 comments1 min readLW link

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

Nov 18, 2022, 1:48 AM

72 points

64 comments12 min readLW link

Engineering Monosemanticity in Toy Models

Adam Jermyn, evhub and Nicholas Schiefer

Nov 18, 2022, 1:43 AM

75 points

7 comments3 min readLW link

(arxiv.org)

AGIs may value intrinsic rewards more than extrinsic ones

catubcNov 17, 2022, 9:49 PM

8 points

6 comments4 min readLW link

LLMs may capture key components of human agency

catubcNov 17, 2022, 8:14 PM

27 points

0 comments4 min readLW link

Mastodon Replies as Comments

jefftkNov 17, 2022, 8:10 PM

20 points

0 comments1 min readLW link

(www.jefftk.com)

Announcing the Progress Forum

jasoncrawfordNov 17, 2022, 7:26 PM

83 points

9 comments1 min readLW link

[Question] What kind of bias is this?

Daniel SamuelNov 17, 2022, 6:44 PM

3 points

2 comments1 min readLW link

AI Forecasting Research Ideas

JsevillamolNov 17, 2022, 5:37 PM

21 points

2 comments LW link

Results from the interpretability hackathon

Esben Kran and Neel Nanda

Nov 17, 2022, 2:51 PM

81 points

0 comments6 min readLW link

(alignmentjam.com)

Covid 11/17/22: Slow Recovery

ZviNov 17, 2022, 2:50 PM

33 points

3 comments4 min readLW link

(thezvi.wordpress.com)

Sadly, FTX

ZviNov 17, 2022, 2:30 PM

133 points

18 comments47 min readLW link

(thezvi.wordpress.com)

Deontology and virtue ethics as “effective theories” of consequentialist ethics

Jan_KulveitNov 17, 2022, 2:11 PM

68 points

9 comments LW link 1 review

The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)

Jessica RumbelowNov 17, 2022, 11:06 AM

27 points

2 comments2 min readLW link

[Question] [Personal Question] Can anyone help me navigate this potentially painful interpersonal dynamic rationally?

SlainLadyMondegreenNov 17, 2022, 8:53 AM

9 points

3 comments4 min readLW link

Massive Scaling Should be Frowned Upon

harsimonyNov 17, 2022, 8:43 AM

4 points

6 comments5 min readLW link

[Question] Why are profitable companies laying off staff?

Yair HalberstadtNov 17, 2022, 6:19 AM

15 points

10 comments1 min readLW link

Discussion: Was SBF a naive utilitarian, or a sociopath?

Nicholas / Heather KrossNov 17, 2022, 2:52 AM

0 points

4 comments LW link

Kelsey Piper’s recent interview of SBF

agucovaNov 16, 2022, 8:30 PM

51 points

29 comments LW link

The Echo Principle

Jonathan MoregårdNov 16, 2022, 8:09 PM

4 points

0 comments3 min readLW link

(honestliving.substack.com)

[Question] Is there some reason LLMs haven’t seen broader use?

tailcalledNov 16, 2022, 8:04 PM

25 points

27 comments1 min readLW link

When should we be surprised that an invention took “so long”?

jasoncrawfordNov 16, 2022, 8:04 PM

32 points

11 comments4 min readLW link

(rootsofprogress.org)

Questions about Value Lock-in, Paternalism, and Empowerment

Sam F. BrownNov 16, 2022, 3:33 PM

13 points

2 comments12 min readLW link

(sambrown.eu)

If Professional Investors Missed This...

jefftkNov 16, 2022, 3:00 PM

37 points

18 comments3 min readLW link

(www.jefftk.com)

Disagreement with bio anchors that lead to shorter timelines

Marius Hobbhahn16 Nov 2022 14:40 UTC

75 points

17 comments7 min readLW link 1 review

Current themes in mechanistic interpretability research

Lee Sharkey, Sid Black and beren

16 Nov 2022 14:14 UTC

89 points

2 comments12 min readLW link

Unpacking “Shard Theory” as Hunch, Question, Theory, and Insight

Jacy Reese Anthis16 Nov 2022 13:54 UTC

31 points

9 comments2 min readLW link

Miracles and why not to believe them

mruwnik16 Nov 2022 12:07 UTC

4 points

0 comments2 min readLW link

[Question] How do people do remote research collaborations effectively?

Krieger16 Nov 2022 11:51 UTC

8 points

0 comments1 min readLW link

Method of statements: an alternative to taboo

Q Home16 Nov 2022 10:57 UTC

7 points

0 comments41 min readLW link

The two conceptions of Active Inference: an intelligence architecture and a theory of agency

Roman Leventov16 Nov 2022 9:30 UTC

17 points

0 comments4 min readLW link