All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28 29 30 31

Response to Holden’s alignment plan

Alex FlintDec 22, 2022, 4:08 PM

36 points

4 comments6 min readLW link

Staring into the abyss as a core life skill

benkuhnDec 22, 2022, 3:30 PM

357 points

22 comments12 min readLW link 1 review

(www.benkuhn.net)

Secular Solstice for children

juliawise and denkenberger

Dec 22, 2022, 2:33 PM

31 points

1 comment3 min readLW link

Mental acceptance and reflection

remember and Gabriel Alfour

Dec 22, 2022, 2:32 PM

34 points

1 comment2 min readLW link

Against Diversification

Jack MaldeDec 22, 2022, 1:29 PM

4 points

0 comments3 min readLW link

(ethicaleconomist.substack.com)

Notes on Meta’s Diplomacy-Playing AI

Erich_GrunewaldDec 22, 2022, 11:34 AM

15 points

2 comments14 min readLW link

(www.erichgrunewald.com)

Take 13: RLHF bad, conditioning good.

Charlie SteinerDec 22, 2022, 10:44 AM

54 points

4 comments2 min readLW link

Applied Linear Algebra Lecture Series

johnswentworthDec 22, 2022, 6:57 AM

103 points

8 comments1 min readLW link

Naive Set Theory, Halmos

David UdellDec 22, 2022, 2:34 AM

11 points

1 comment8 min readLW link

Not Getting Hacked

jefftkDec 21, 2022, 9:40 PM

40 points

14 comments7 min readLW link

(www.jefftk.com)

Metaphor.systems

the gears to ascensionDec 21, 2022, 9:31 PM

25 points

9 comments1 min readLW link

(metaphor.systems)

[Question] How much is DQC (Dynamic Quantum Clustering) currently looked into in AI Capabilities Research?

macmillanDec 21, 2022, 8:46 PM

1 point

0 comments1 min readLW link

Think wider about the root causes of progress

jasoncrawfordDec 21, 2022, 8:05 PM

49 points

11 comments4 min readLW link

(rootsofprogress.org)

[Question] What readings did you consider best for the happy parts of the secular solstice?

ChristianKlDec 21, 2022, 3:45 PM

17 points

0 comments1 min readLW link

Recreating logic in type theory

Thomas KehrenbergDec 21, 2022, 3:19 PM

18 points

0 comments13 min readLW link

You become the UI you use

ViliamDec 21, 2022, 3:04 PM

21 points

7 comments2 min readLW link

Price’s equation for neural networks

tailcalledDec 21, 2022, 1:09 PM

29 points

4 comments2 min readLW link

Decisions: Ontologically Shifting to Determinism

Chris_LeongDec 21, 2022, 12:41 PM

8 points

11 comments6 min readLW link

A Comprehensive Mechanistic Interpretability Explainer & Glossary

Neel NandaDec 21, 2022, 12:35 PM

91 points

6 comments2 min readLW link

(neelnanda.io)

Google Search loses to ChatGPT fair and square

ShmiDec 21, 2022, 8:11 AM

14 points

17 comments1 min readLW link

(www.surgehq.ai)

Sazen

Duncan Sabien (Inactive)Dec 21, 2022, 7:54 AM

285 points

83 comments12 min readLW link 2 reviews

Podcast: What’s Wrong With LessWrong

AlfredDec 21, 2022, 7:06 AM

−32 points

11 comments1 min readLW link

(youtu.be)

New AI risk intro from Vox [link post]

JakubKDec 21, 2022, 6:00 AM

5 points

1 comment2 min readLW link

(www.vox.com)

Local Memes Against Geometric Rationality

Scott GarrabrantDec 21, 2022, 3:53 AM

90 points

3 comments6 min readLW link

Logging Shell History in Zsh

jefftkDec 21, 2022, 3:30 AM

19 points

2 comments1 min readLW link

(www.jefftk.com)

CIRL Corrigibility is Fragile

Rachel Freedman and AdamGleave

Dec 21, 2022, 1:40 AM

58 points

8 comments12 min readLW link

[Question] [DISC] Are Values Robust?

DragonGodDec 21, 2022, 1:00 AM

12 points

9 comments2 min readLW link

Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values

Garrett BakerDec 21, 2022, 12:44 AM

9 points

10 comments5 min readLW link

Progress links and tweets, 2022-12-20

jasoncrawfordDec 21, 2022, 12:35 AM

12 points

0 comments2 min readLW link

(rootsofprogress.org)

K-complexity is silly; use cross-entropy instead

So8resDec 20, 2022, 11:06 PM

147 points

54 comments14 min readLW link 2 reviews

Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Orpheus16Dec 20, 2022, 9:39 PM

18 points

2 comments11 min readLW link

Discovering Language Model Behaviors with Model-Written Evaluations

evhub and Ethan Perez

Dec 20, 2022, 8:08 PM

100 points

34 comments1 min readLW link

(www.anthropic.com)

Reflections: Bureaucratic Hell

Haris RashidDec 20, 2022, 7:22 PM

−5 points

1 comment1 min readLW link

(www.harisrab.com)

Proliferating Education

Haris RashidDec 20, 2022, 7:22 PM

−1 points

2 comments5 min readLW link

(www.harisrab.com)

AGI is here, but nobody wants it. Why should we even care?

MGowDec 20, 2022, 7:14 PM

−22 points

0 comments17 min readLW link

Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development

Roman LeventovDec 20, 2022, 5:13 PM

33 points

3 comments36 min readLW link

I believe some AI doomers are overconfident

FTPickleDec 20, 2022, 5:09 PM

8 points

15 comments2 min readLW link

Note on algorithms with multiple trained components

Steven ByrnesDec 20, 2022, 5:08 PM

23 points

4 comments2 min readLW link

Marvel Snap: Phase 2

ZviDec 20, 2022, 2:50 PM

11 points

1 comment13 min readLW link

(thezvi.wordpress.com)

(Extremely) Naive Gradient Hacking Doesn’t Work

ojorgensenDec 20, 2022, 2:35 PM

17 points

0 comments6 min readLW link

An Open Agency Architecture for Safe Transformative AI

davidadDec 20, 2022, 1:04 PM

80 points

22 comments4 min readLW link

Under-Appreciated Ways to Use Flashcards—Part I

Florence HinderDec 20, 2022, 12:43 PM

22 points

5 comments5 min readLW link

(thoughtsaver.ghost.io)

EA & LW Forums Weekly Summary (12th Dec − 18th Dec 22′)

Zoe WilliamsDec 20, 2022, 9:49 AM

10 points

0 comments LW link

[link, 2019] AI paradigm: interactive learning from unlabeled instructions

the gears to ascensionDec 20, 2022, 6:45 AM

2 points

0 comments2 min readLW link

(jgrizou.github.io)

[Fiction] Unspoken Stone

Gordon Seidoh WorleyDec 20, 2022, 5:11 AM

19 points

0 comments5 min readLW link

Notice when you stop reading right before you understand

just_browsingDec 20, 2022, 5:09 AM

61 points

6 comments1 min readLW link

Take 12: RLHF’s use is evidence that orgs will jam RL at real-world problems.

Charlie SteinerDec 20, 2022, 5:01 AM

25 points

1 comment3 min readLW link

More notes from raising a late-talking kid

Steven ByrnesDec 20, 2022, 2:13 AM

40 points

2 comments6 min readLW link

The “Minimal Latents” Approach to Natural Abstractions

johnswentworthDec 20, 2022, 1:22 AM

53 points

24 comments12 min readLW link

Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceCDec 19, 2022, 10:52 PM

150 points

30 comments18 min readLW link