All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30

Reflective Consequentialism

Adam ZernerNov 18, 2022, 11:56 PM

21 points

14 comments4 min readLW link

Value Created vs. Value Extracted

SableNov 18, 2022, 9:34 PM

8 points

6 comments6 min readLW link

(affablyevil.substack.com)

The Disastrously Confident And Inaccurate AI

Sharat Jacob JacobNov 18, 2022, 7:06 PM

13 points

0 comments13 min readLW link

How AI Fails Us: A non-technical view of the Alignment Problem

testingthewatersNov 18, 2022, 7:02 PM

7 points

1 comment2 min readLW link

(ethics.harvard.edu)

[Question] Is there any policy for a fair treatment of AIs whose friendliness is in doubt?

nahojNov 18, 2022, 7:01 PM

15 points

10 comments1 min readLW link

Distillation of “How Likely Is Deceptive Alignment?”

NickGabsNov 18, 2022, 4:31 PM

24 points

4 comments10 min readLW link

Contra Chords

jefftkNov 18, 2022, 4:20 PM

12 points

1 comment7 min readLW link

(www.jefftk.com)

[Question] Updates on scaling laws for foundation models from ′ Transcending Scaling Laws with 0.1% Extra Compute’

Nick_GreigNov 18, 2022, 12:46 PM

15 points

2 comments1 min readLW link

Halifax, NS – Monthly Rationalist, EA, and ACX Meetup

IdeopunkNov 18, 2022, 11:45 AM

10 points

0 comments1 min readLW link

Introducing The Logical Foundation, A Plan to End Poverty With Guaranteed Income

Michael SimmNov 18, 2022, 8:13 AM

9 points

23 comments LW link

My Deontology Says Narrow-Mindedness is Always Wrong

LVSNNov 18, 2022, 6:11 AM

6 points

2 comments1 min readLW link

AI Ethics != Ai Safety

DentinNov 18, 2022, 3:02 AM

2 points

0 comments1 min readLW link

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

Nov 18, 2022, 1:48 AM

72 points

64 comments12 min readLW link

Engineering Monosemanticity in Toy Models

Adam Jermyn, evhub and Nicholas Schiefer

Nov 18, 2022, 1:43 AM

75 points

7 comments3 min readLW link

(arxiv.org)

AGIs may value intrinsic rewards more than extrinsic ones

catubcNov 17, 2022, 9:49 PM

8 points

6 comments4 min readLW link

LLMs may capture key components of human agency

catubcNov 17, 2022, 8:14 PM

27 points

0 comments4 min readLW link

Mastodon Replies as Comments

jefftkNov 17, 2022, 8:10 PM

20 points

0 comments1 min readLW link

(www.jefftk.com)

Announcing the Progress Forum

jasoncrawfordNov 17, 2022, 7:26 PM

83 points

9 comments1 min readLW link

[Question] What kind of bias is this?

Daniel SamuelNov 17, 2022, 6:44 PM

3 points

2 comments1 min readLW link

AI Forecasting Research Ideas

JsevillamolNov 17, 2022, 5:37 PM

21 points

2 comments LW link

Results from the interpretability hackathon

Esben Kran and Neel Nanda

Nov 17, 2022, 2:51 PM

81 points

0 comments6 min readLW link

(alignmentjam.com)

Covid 11/17/22: Slow Recovery

ZviNov 17, 2022, 2:50 PM

33 points

3 comments4 min readLW link

(thezvi.wordpress.com)

Sadly, FTX

ZviNov 17, 2022, 2:30 PM

133 points

18 comments47 min readLW link

(thezvi.wordpress.com)

Deontology and virtue ethics as “effective theories” of consequentialist ethics

Jan_KulveitNov 17, 2022, 2:11 PM

68 points

9 comments LW link 1 review

The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)

Jessica RumbelowNov 17, 2022, 11:06 AM

27 points

2 comments2 min readLW link

[Question] [Personal Question] Can anyone help me navigate this potentially painful interpersonal dynamic rationally?

SlainLadyMondegreenNov 17, 2022, 8:53 AM

9 points

3 comments4 min readLW link

Massive Scaling Should be Frowned Upon

harsimonyNov 17, 2022, 8:43 AM

4 points

6 comments5 min readLW link

[Question] Why are profitable companies laying off staff?

Yair HalberstadtNov 17, 2022, 6:19 AM

15 points

10 comments1 min readLW link

Discussion: Was SBF a naive utilitarian, or a sociopath?

Nicholas / Heather KrossNov 17, 2022, 2:52 AM

0 points

4 comments LW link

Kelsey Piper’s recent interview of SBF

agucovaNov 16, 2022, 8:30 PM

51 points

29 comments LW link

The Echo Principle

Jonathan MoregårdNov 16, 2022, 8:09 PM

4 points

0 comments3 min readLW link

(honestliving.substack.com)

[Question] Is there some reason LLMs haven’t seen broader use?

tailcalledNov 16, 2022, 8:04 PM

25 points

27 comments1 min readLW link

When should we be surprised that an invention took “so long”?

jasoncrawfordNov 16, 2022, 8:04 PM

32 points

11 comments4 min readLW link

(rootsofprogress.org)

Questions about Value Lock-in, Paternalism, and Empowerment

Sam F. BrownNov 16, 2022, 3:33 PM

13 points

2 comments12 min readLW link

(sambrown.eu)

If Professional Investors Missed This...

jefftkNov 16, 2022, 3:00 PM

37 points

18 comments3 min readLW link

(www.jefftk.com)

Disagreement with bio anchors that lead to shorter timelines

Marius HobbhahnNov 16, 2022, 2:40 PM

75 points

17 comments7 min readLW link 1 review

Current themes in mechanistic interpretability research

Lee Sharkey, Sid Black and beren

Nov 16, 2022, 2:14 PM

89 points

2 comments12 min readLW link

Unpacking “Shard Theory” as Hunch, Question, Theory, and Insight

Jacy Reese AnthisNov 16, 2022, 1:54 PM

31 points

9 comments2 min readLW link

Miracles and why not to believe them

mruwnikNov 16, 2022, 12:07 PM

4 points

0 comments2 min readLW link

[Question] How do people do remote research collaborations effectively?

KriegerNov 16, 2022, 11:51 AM

8 points

0 comments1 min readLW link

Method of statements: an alternative to taboo

Q HomeNov 16, 2022, 10:57 AM

7 points

0 comments41 min readLW link

The two conceptions of Active Inference: an intelligence architecture and a theory of agency

Roman LeventovNov 16, 2022, 9:30 AM

17 points

0 comments4 min readLW link

Developer experience for the motivation

Adam Zerner16 Nov 2022 7:12 UTC

49 points

7 comments4 min readLW link

Progress links and tweets, 2022-11-15

jasoncrawford16 Nov 2022 3:21 UTC

9 points

0 comments2 min readLW link

(rootsofprogress.org)

EA & LW Forums Weekly Summary (7th Nov − 13th Nov 22′)

Zoe Williams16 Nov 2022 3:04 UTC

19 points

0 comments LW link

The FTX Saga—Simplified

Annapurna16 Nov 2022 2:42 UTC

44 points

10 comments7 min readLW link

(jorgevelez.substack.com)

Utilitarianism and the idea of a “rational agent” are fundamentally inconsistent with reality

banev16 Nov 2022 0:19 UTC

−4 points

1 comment1 min readLW link

[Question] Is the speed of training large models going to increase significantly in the near future due to Cerebras Andromeda?

Amal 15 Nov 2022 22:50 UTC

13 points

11 comments1 min readLW link

[Question] What is our current best infohazard policy for AGI (safety) research?

Roman Leventov15 Nov 2022 22:33 UTC

12 points

2 comments1 min readLW link

ACX/SSC Meetup 1 pm Sunday Nov 20

svfritz15 Nov 2022 20:39 UTC

2 points

0 comments1 min readLW link