Oct 8, 2022, 7:09 PM

72 points

12 comments4 min readLW link

Maximal Lottery-Lotteries

Scott GarrabrantOct 17, 2022, 8:39 PM

72 points

15 comments4 min readLW link

(OLD) An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers

Neel NandaOct 18, 2022, 9:08 PM

72 points

5 comments12 min readLW link

(www.neelnanda.io)

Resources that (I think) new alignment researchers should know about

Orpheus16Oct 28, 2022, 10:13 PM

70 points

9 comments4 min readLW link

Signals of war in August 2021

yieldthoughtOct 26, 2022, 8:11 AM

70 points

16 comments2 min readLW link

The Balto/Togo theory of scientific development

ElizabethOct 9, 2022, 6:30 PM

69 points

5 comments2 min readLW link

(acesounderglass.com)

New book on s-risks

Tobias_BaumannOct 28, 2022, 9:36 AM

68 points

1 comment LW link

QAPR 4: Inductive biases

Quintin PopeOct 10, 2022, 10:08 PM

67 points

2 comments18 min readLW link

Possible miracles

Orpheus16 and Thomas Larsen

Oct 9, 2022, 6:17 PM

64 points

34 comments8 min readLW link

Notes on “Can you control the past”

So8resOct 20, 2022, 3:41 AM

64 points

41 comments21 min readLW link

A Barebones Guide to Mechanistic Interpretability Prerequisites

Neel NandaOct 24, 2022, 8:45 PM

64 points

12 comments3 min readLW link

(neelnanda.io)

Beyond Kolmogorov and Shannon

Alexander Gietelink Oldenziel and Adam Shai

Oct 25, 2022, 3:13 PM

63 points

22 comments5 min readLW link

The harms you don’t see

ViktoriaMalyasovaOct 16, 2022, 11:45 PM

63 points

54 comments10 min readLW link

The optimal timing of spending on AGI safety work; why we should probably be spending more now

Tristan CookOct 24, 2022, 5:42 PM

62 points

0 comments LW link

Empowerment is (almost) All We Need

jacob_cannellOct 23, 2022, 9:48 PM

61 points

44 comments17 min readLW link

Clarifying Your Principles

RaemonOct 1, 2022, 9:20 PM

60 points

10 comments9 min readLW link

Calibration of a thousand predictions

KatjaGraceOct 12, 2022, 8:50 AM

59 points

7 comments5 min readLW link

(worldspiritsockpuppet.com)

Calibrate—New Chrome Extension for hiding numbers so you can guess

chanamessingerOct 7, 2022, 11:21 AM

59 points

16 comments1 min readLW link

(chrome.google.com)

How Risky Is Trick-or-Treating?

jefftkOct 27, 2022, 2:10 PM

58 points

18 comments2 min readLW link

(www.jefftk.com)

aisafety.community—A living document of AI safety communities

zeshen and plex

Oct 28, 2022, 5:50 PM

58 points

23 comments1 min readLW link

Looping

Jarred FilmerOct 5, 2022, 1:47 AM

56 points

6 comments2 min readLW link

More examples of goal misgeneralization

Rohin Shah and Vikrant Varma

Oct 7, 2022, 2:38 PM

56 points

8 comments2 min readLW link

(deepmindsafetyresearch.medium.com)

Covid 10/20/22: Wait, We Did WHAT?

ZviOct 20, 2022, 9:50 PM

55 points

16 comments16 min readLW link

(thezvi.wordpress.com)

Anonymous advice: If you want to reduce AI risk, should you take roles that advance AI capabilities?

Benjamin HiltonOct 11, 2022, 2:16 PM

54 points

9 comments LW link

Paper: Large Language Models Can Self-improve [Linkpost]

Evan R. MurphyOct 2, 2022, 1:29 AM

52 points

15 comments1 min readLW link

(openreview.net)

A Walkthrough of A Mathematical Framework for Transformer Circuits

Neel NandaOct 25, 2022, 8:24 PM

52 points

7 comments1 min readLW link

(www.youtube.com)

Smoke without fire is scary

Adam JermynOct 4, 2022, 9:08 PM

52 points

22 comments4 min readLW link

Towards a comprehensive study of potential psychological causes of the ordinary range of variation of affective gender identity in males

tailcalledOct 12, 2022, 9:10 PM

52 points

4 comments37 min readLW link

Weekly Non-Covid News #1 (10/13/22)

ZviOct 13, 2022, 3:40 PM

52 points

16 comments16 min readLW link

(thezvi.wordpress.com)

Space

Jarred FilmerOct 17, 2022, 6:34 AM

50 points

0 comments3 min readLW link

Why I think nuclear war triggered by Russian tactical nukes in Ukraine is unlikely

Dave OrrOct 11, 2022, 6:30 PM

50 points

7 comments3 min readLW link

They gave LLMs access to physics simulators

ryan_bOct 17, 2022, 9:21 PM

50 points

18 comments1 min readLW link

(arxiv.org)

Humans aren’t fitness maximizers

So8resOct 4, 2022, 1:31 AM

50 points

46 comments5 min readLW link

Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small

Haoxing Du and Buck

Oct 12, 2022, 9:25 PM

50 points

11 comments4 min readLW link

Is GPT-N bounded by human capabilities? No.

Cleo NardoOct 17, 2022, 11:26 PM

49 points

8 comments2 min readLW link

Good ontologies induce commutative diagrams

Erik JennerOct 9, 2022, 12:06 AM

49 points

5 comments14 min readLW link

Alignment Might Never Be Solved, By Humans or AI

intersticeOct 7, 2022, 4:14 PM

49 points

6 comments3 min readLW link

We can do better than argmax

Jan_KulveitOct 10, 2022, 10:32 AM

49 points

4 comments LW link

Prettified AI Safety Game Cards

abramdemskiOct 11, 2022, 7:35 PM

47 points

6 comments1 min readLW link

A common failure for foxes

Rob BensingerOct 14, 2022, 10:50 PM

47 points

7 comments2 min readLW link

[Question] What sorts of preparations ought I do in case of further escalation in Ukraine?

tailcalledOct 1, 2022, 4:44 PM

47 points

7 comments1 min readLW link

Are c-sections underrated?

bracesOct 1, 2022, 8:32 PM

47 points

15 comments6 min readLW link

How to Take Over the Universe (in Three Easy Steps)

WriterOct 18, 2022, 3:04 PM

47 points

17 comments12 min readLW link

(youtu.be)

Apollo

Jarred FilmerOct 10, 2022, 9:30 PM

46 points

0 comments3 min readLW link

Four usages of “loss” in AI

TurnTroutOct 2, 2022, 12:52 AM

46 points

18 comments4 min readLW link

Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA

Marius HobbhahnOct 4, 2022, 7:22 AM

46 points

11 comments1 min readLW link

(arxiv.org)

A review of the Bio-Anchors report

jylin04Oct 3, 2022, 10:27 AM

45 points

4 comments1 min readLW link

(docs.google.com)

Trigger-based rapid checklists

VipulNaikOct 26, 2022, 4:05 AM

44 points

0 comments9 min readLW link

A conversation about Katja’s counterarguments to AI risk

Matthew Barnett and Ege Erdil

Oct 18, 2022, 6:40 PM

43 points

9 comments33 min readLW link

Recall and Regurgitation in GPT2

Megan KinnimentOct 3, 2022, 7:35 PM

43 points

1 comment26 min readLW link