Stop Dis­cour­ag­ing Microwave For­mula Preparation

jefftkSep 2, 2022, 2:10 AM
68 points
12 comments2 min readLW link
(www.jefftk.com)

Why de­cep­tive al­ign­ment mat­ters for AGI safety

Marius HobbhahnSep 15, 2022, 1:38 PM
68 points
13 comments13 min readLW link

Quintin’s al­ign­ment pa­pers roundup—week 2

Quintin PopeSep 19, 2022, 1:41 PM
67 points
2 comments10 min readLW link

Self-Con­trol Se­crets of the Pu­ri­tan Masters

David Hugh-JonesSep 26, 2022, 9:04 AM
67 points
3 comments5 min readLW link
(wyclif.substack.com)

LOVE in a sim­box is all you need

jacob_cannellSep 28, 2022, 6:25 PM
66 points
73 comments44 min readLW link1 review

Where I cur­rently dis­agree with Ryan Green­blatt’s ver­sion of the ELK approach

So8resSep 29, 2022, 9:18 PM
65 points
7 comments5 min readLW link

Book re­view: “The Heart of the Brain: The Hy­potha­la­mus and Its Hor­mones”

Steven ByrnesSep 27, 2022, 1:20 PM
65 points
3 comments18 min readLW link

A game of mattering

KatjaGraceSep 23, 2022, 2:30 AM
64 points
7 comments5 min readLW link
(worldspiritsockpuppet.com)

[Closed] Prize and fast track to al­ign­ment re­search at ALTER

Vanessa KosoySep 17, 2022, 4:58 PM
63 points
8 comments3 min readLW link

Clar­ify­ing the Agent-Like Struc­ture Problem

johnswentworthSep 29, 2022, 9:28 PM
63 points
17 comments6 min readLW link

In­fra-Ex­er­cises, Part 1

Sep 1, 2022, 5:06 AM
62 points
10 comments1 min readLW link

Pri­vate al­ign­ment re­search shar­ing and coordination

porbySep 4, 2022, 12:01 AM
62 points
13 comments5 min readLW link

Gra­di­ent Hacker De­sign Prin­ci­ples From Biology

johnswentworthSep 1, 2022, 7:03 PM
60 points
13 comments3 min readLW link

Re­view of Ex­am­ine.com’s vi­tamin write-ups

Sep 26, 2022, 11:40 PM
60 points
1 comment5 min readLW link
(acesounderglass.com)

Fake qual­ities of mind

Kaj_SotalaSep 22, 2022, 4:40 PM
59 points
2 comments2 min readLW link
(kajsotala.fi)

Ar­gu­ment against 20% GDP growth from AI within 10 years [Linkpost]

aogSep 12, 2022, 4:08 AM
59 points
20 comments5 min readLW link
(twitter.com)

QAPR 3: in­ter­pretabil­ity-guided train­ing of neu­ral nets

Quintin PopeSep 28, 2022, 4:02 PM
58 points
2 comments10 min readLW link

Lev­el­ling Up in AI Safety Re­search Engineering

Gabe MSep 2, 2022, 4:59 AM
58 points
9 comments17 min readLW link

Re­place­ment for PONR concept

Daniel KokotajloSep 2, 2022, 12:09 AM
58 points
6 comments2 min readLW link

Deep Q-Net­works Explained

Jay BaileySep 13, 2022, 12:01 PM
58 points
8 comments20 min readLW link

Two rea­sons we might be closer to solv­ing al­ign­ment than it seems

Sep 24, 2022, 8:00 PM
57 points
9 comments4 min readLW link

We may be able to see sharp left turns coming

Sep 3, 2022, 2:55 AM
54 points
29 comments1 min readLW link

Why was progress so slow in the past?

jasoncrawfordSep 1, 2022, 8:26 PM
54 points
31 comments6 min readLW link
(rootsofprogress.org)

Method­olog­i­cal Ther­apy: An Agenda For Tack­ling Re­search Bottlenecks

Sep 22, 2022, 6:41 PM
54 points
6 comments9 min readLW link

First we shape our so­cial graph; then it shapes us

Henrik KarlssonSep 7, 2022, 3:50 PM
53 points
6 comments8 min readLW link
(escapingflatland.substack.com)

When does tech­ni­cal work to re­duce AGI con­flict make a differ­ence?: Introduction

Sep 14, 2022, 7:38 PM
52 points
3 comments6 min readLW link

Co­or­di­nate-Free In­ter­pretabil­ity Theory

johnswentworthSep 14, 2022, 11:33 PM
52 points
16 comments5 min readLW link

Many ther­apy schools work with in­ner mul­ti­plic­ity (not just IFS)

Sep 17, 2022, 10:27 AM
52 points
16 comments18 min readLW link

ACT-1: Trans­former for Actions

Daniel KokotajloSep 14, 2022, 7:09 PM
52 points
4 comments1 min readLW link
(www.adept.ai)

When would AGIs en­gage in con­flict?

Sep 14, 2022, 7:38 PM
52 points
5 comments13 min readLW link

EA & LW Fo­rums Weekly Sum­mary (28 Aug − 3 Sep 22’)

Zoe WilliamsSep 6, 2022, 11:06 AM
51 points
2 comments14 min readLW link

My Thoughts on the ML Safety Course

zeshenSep 27, 2022, 1:15 PM
50 points
3 comments17 min readLW link

Dan Luu on Fu­tur­ist Predictions

RobertMSep 14, 2022, 3:01 AM
50 points
9 comments5 min readLW link
(danluu.com)

Some notes on solv­ing hard problems

Joe RoccaSep 19, 2022, 12:58 PM
50 points
8 comments29 min readLW link

Soft skills for meetups

mingyuanSep 27, 2022, 5:26 PM
49 points
3 comments5 min readLW link

Un­der­stand­ing and avoid­ing value drift

TurnTroutSep 9, 2022, 4:16 AM
48 points
14 comments6 min readLW link

Brief Notes on Transformers

Adam JermynSep 26, 2022, 2:46 PM
48 points
3 comments2 min readLW link

Es­ti­mat­ing the Cur­rent and Fu­ture Num­ber of AI Safety Researchers

Stephen McAleeseSep 28, 2022, 9:11 PM
47 points
14 comments9 min readLW link
(forum.effectivealtruism.org)

Covid 9/​29/​22: The Jones Act Waver

ZviSep 29, 2022, 6:20 PM
47 points
10 comments24 min readLW link
(thezvi.wordpress.com)

A Library and Tu­to­rial for Fac­tored Cog­ni­tion with Lan­guage Models

Sep 28, 2022, 6:15 PM
47 points
0 comments1 min readLW link

[An email with a bunch of links I sent an ex­pe­rienced ML re­searcher in­ter­ested in learn­ing about Align­ment /​ x-safety.]

David Scott Krueger (formerly: capybaralet)Sep 8, 2022, 10:28 PM
47 points
1 comment5 min readLW link

Prize idea: Trans­mit MIRI and Eliezer’s worldviews

eliflandSep 19, 2022, 9:21 PM
47 points
18 comments2 min readLW link

Scrap­ing train­ing data for your mind

Henrik KarlssonSep 21, 2022, 4:27 PM
47 points
4 comments8 min readLW link
(escapingflatland.substack.com)

AI Safety field-build­ing pro­jects I’d like to see

Orpheus16Sep 11, 2022, 11:43 PM
46 points
8 comments6 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

Sep 11, 2022, 10:57 AM
46 points
13 comments30 min readLW link

Pre­tend­ing not to Notice

jefftkSep 19, 2022, 2:30 AM
46 points
12 comments2 min readLW link
(www.jefftk.com)

Align­ment via proso­cial brain algorithms

Cameron BergSep 12, 2022, 1:48 PM
45 points
30 comments6 min readLW link

It mat­ters when the first sharp left turn happens

Adam JermynSep 29, 2022, 8:12 PM
45 points
9 comments4 min readLW link

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon LangSep 18, 2022, 1:08 PM
44 points
3 comments1 min readLW link
(docs.google.com)

Samotsvety’s AI risk forecasts

eliflandSep 9, 2022, 4:01 AM
44 points
0 comments4 min readLW link