SERI MATS Pro­gram—Win­ter 2022 Cohort

Oct 8, 2022, 7:09 PM
72 points
12 comments4 min readLW link

Max­i­mal Lot­tery-Lotteries

Scott GarrabrantOct 17, 2022, 8:39 PM
72 points
15 comments4 min readLW link

(OLD) An Ex­tremely Opinionated An­no­tated List of My Favourite Mechanis­tic In­ter­pretabil­ity Papers

Neel NandaOct 18, 2022, 9:08 PM
72 points
5 comments12 min readLW link
(www.neelnanda.io)

Re­sources that (I think) new al­ign­ment re­searchers should know about

Orpheus16Oct 28, 2022, 10:13 PM
70 points
9 comments4 min readLW link

Sig­nals of war in Au­gust 2021

yieldthoughtOct 26, 2022, 8:11 AM
70 points
16 comments2 min readLW link

The Balto/​Togo the­ory of sci­en­tific development

ElizabethOct 9, 2022, 6:30 PM
69 points
5 comments2 min readLW link
(acesounderglass.com)

New book on s-risks

Tobias_BaumannOct 28, 2022, 9:36 AM
68 points
1 commentLW link

QAPR 4: In­duc­tive biases

Quintin PopeOct 10, 2022, 10:08 PM
67 points
2 comments18 min readLW link

Pos­si­ble miracles

Oct 9, 2022, 6:17 PM
64 points
34 comments8 min readLW link

Notes on “Can you con­trol the past”

So8resOct 20, 2022, 3:41 AM
64 points
41 comments21 min readLW link

A Bare­bones Guide to Mechanis­tic In­ter­pretabil­ity Prerequisites

Neel NandaOct 24, 2022, 8:45 PM
64 points
12 comments3 min readLW link
(neelnanda.io)

Beyond Kol­mogorov and Shannon

Oct 25, 2022, 3:13 PM
63 points
22 comments5 min readLW link

The harms you don’t see

ViktoriaMalyasovaOct 16, 2022, 11:45 PM
63 points
54 comments10 min readLW link

The op­ti­mal timing of spend­ing on AGI safety work; why we should prob­a­bly be spend­ing more now

Tristan CookOct 24, 2022, 5:42 PM
62 points
0 commentsLW link

Em­pow­er­ment is (al­most) All We Need

jacob_cannellOct 23, 2022, 9:48 PM
61 points
44 comments17 min readLW link

Clar­ify­ing Your Principles

RaemonOct 1, 2022, 9:20 PM
60 points
10 comments9 min readLW link

Cal­ibra­tion of a thou­sand predictions

KatjaGraceOct 12, 2022, 8:50 AM
59 points
7 comments5 min readLW link
(worldspiritsockpuppet.com)

Cal­ibrate—New Chrome Ex­ten­sion for hid­ing num­bers so you can guess

chanamessingerOct 7, 2022, 11:21 AM
59 points
16 comments1 min readLW link
(chrome.google.com)

How Risky Is Trick-or-Treat­ing?

jefftkOct 27, 2022, 2:10 PM
58 points
18 comments2 min readLW link
(www.jefftk.com)

aisafety.com­mu­nity—A liv­ing doc­u­ment of AI safety communities

Oct 28, 2022, 5:50 PM
58 points
23 comments1 min readLW link

Looping

Jarred FilmerOct 5, 2022, 1:47 AM
56 points
6 comments2 min readLW link

More ex­am­ples of goal misgeneralization

Oct 7, 2022, 2:38 PM
56 points
8 comments2 min readLW link
(deepmindsafetyresearch.medium.com)

Covid 10/​20/​22: Wait, We Did WHAT?

ZviOct 20, 2022, 9:50 PM
55 points
16 comments16 min readLW link
(thezvi.wordpress.com)

Anony­mous ad­vice: If you want to re­duce AI risk, should you take roles that ad­vance AI ca­pa­bil­ities?

Benjamin HiltonOct 11, 2022, 2:16 PM
54 points
9 commentsLW link

Paper: Large Lan­guage Models Can Self-im­prove [Linkpost]

Evan R. MurphyOct 2, 2022, 1:29 AM
52 points
15 comments1 min readLW link
(openreview.net)

A Walk­through of A Math­e­mat­i­cal Frame­work for Trans­former Circuits

Neel NandaOct 25, 2022, 8:24 PM
52 points
7 comments1 min readLW link
(www.youtube.com)

Smoke with­out fire is scary

Adam JermynOct 4, 2022, 9:08 PM
52 points
22 comments4 min readLW link

Towards a com­pre­hen­sive study of po­ten­tial psy­cholog­i­cal causes of the or­di­nary range of vari­a­tion of af­fec­tive gen­der iden­tity in males

tailcalledOct 12, 2022, 9:10 PM
52 points
4 comments37 min readLW link

Weekly Non-Covid News #1 (10/​13/​22)

ZviOct 13, 2022, 3:40 PM
52 points
16 comments16 min readLW link
(thezvi.wordpress.com)

Space

Jarred FilmerOct 17, 2022, 6:34 AM
50 points
0 comments3 min readLW link

Why I think nu­clear war trig­gered by Rus­sian tac­ti­cal nukes in Ukraine is unlikely

Dave OrrOct 11, 2022, 6:30 PM
50 points
7 comments3 min readLW link

They gave LLMs ac­cess to physics simulators

ryan_bOct 17, 2022, 9:21 PM
50 points
18 comments1 min readLW link
(arxiv.org)

Hu­mans aren’t fit­ness maximizers

So8resOct 4, 2022, 1:31 AM
50 points
46 comments5 min readLW link

Help out Red­wood Re­search’s in­ter­pretabil­ity team by find­ing heuris­tics im­ple­mented by GPT-2 small

Oct 12, 2022, 9:25 PM
50 points
11 comments4 min readLW link

Is GPT-N bounded by hu­man ca­pa­bil­ities? No.

Cleo NardoOct 17, 2022, 11:26 PM
49 points
8 comments2 min readLW link

Good on­tolo­gies in­duce com­mu­ta­tive diagrams

Erik JennerOct 9, 2022, 12:06 AM
49 points
5 comments14 min readLW link

Align­ment Might Never Be Solved, By Hu­mans or AI

intersticeOct 7, 2022, 4:14 PM
49 points
6 comments3 min readLW link

We can do bet­ter than argmax

Jan_KulveitOct 10, 2022, 10:32 AM
49 points
4 commentsLW link

Pret­tified AI Safety Game Cards

abramdemskiOct 11, 2022, 7:35 PM
47 points
6 comments1 min readLW link

A com­mon failure for foxes

Rob BensingerOct 14, 2022, 10:50 PM
47 points
7 comments2 min readLW link

[Question] What sorts of prepa­ra­tions ought I do in case of fur­ther es­ca­la­tion in Ukraine?

tailcalledOct 1, 2022, 4:44 PM
47 points
7 comments1 min readLW link

Are c-sec­tions un­der­rated?

bracesOct 1, 2022, 8:32 PM
47 points
15 comments6 min readLW link

How to Take Over the Uni­verse (in Three Easy Steps)

WriterOct 18, 2022, 3:04 PM
47 points
17 comments12 min readLW link
(youtu.be)

Apollo

Jarred FilmerOct 10, 2022, 9:30 PM
46 points
0 comments3 min readLW link

Four us­ages of “loss” in AI

TurnTroutOct 2, 2022, 12:52 AM
46 points
18 comments4 min readLW link

Paper+Sum­mary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA

Marius HobbhahnOct 4, 2022, 7:22 AM
46 points
11 comments1 min readLW link
(arxiv.org)

A re­view of the Bio-An­chors report

jylin04Oct 3, 2022, 10:27 AM
45 points
4 comments1 min readLW link
(docs.google.com)

Trig­ger-based rapid checklists

VipulNaikOct 26, 2022, 4:05 AM
44 points
0 comments9 min readLW link

A con­ver­sa­tion about Katja’s coun­ter­ar­gu­ments to AI risk

Oct 18, 2022, 6:40 PM
43 points
9 comments33 min readLW link

Re­call and Re­gur­gi­ta­tion in GPT2

Megan KinnimentOct 3, 2022, 7:35 PM
43 points
1 comment26 min readLW link