My Overview of the AI Align­ment Land­scape: Threat Models

Neel NandaDec 25, 2021, 11:07 PM
53 points
3 comments28 min readLW link

[Question] What is a prob­a­bil­is­tic phys­i­cal the­ory?

Ege ErdilDec 25, 2021, 4:30 PM
15 points
36 comments2 min readLW link

Belief-con­di­tional things—things that only ex­ist when you be­lieve in them

JanDec 25, 2021, 10:49 AM
7 points
3 comments5 min readLW link
(universalprior.substack.com)

Tough Choices and Disappointment

maralornDec 24, 2021, 9:59 PM
2 points
6 comments1 min readLW link

Con­verg­ing to­ward a Million Worlds

Joe KwonDec 24, 2021, 9:33 PM
11 points
1 comment3 min readLW link

Un­der­stand­ing the ten­sor product for­mu­la­tion in Trans­former Circuits

Tom LieberumDec 24, 2021, 6:05 PM
16 points
2 comments3 min readLW link

[Question] How to se­lect a long-term goal and al­ign my mind to­wards it?

AlexanderDec 24, 2021, 11:40 AM
19 points
8 comments2 min readLW link

Pr­ereq­ui­site Skills

lsusrDec 24, 2021, 10:11 AM
17 points
3 comments1 min readLW link

Mechanis­tic In­ter­pretabil­ity for the MLP Lay­ers (rough early thoughts)

MadHatterDec 24, 2021, 7:24 AM
12 points
3 comments1 min readLW link
(www.youtube.com)

Risks from AI persuasion

Beth BarnesDec 24, 2021, 1:48 AM
76 points
15 comments31 min readLW link

Pri­ori­tiz­ing Information

jsteinhardtDec 24, 2021, 12:00 AM
18 points
0 comments7 min readLW link
(bounded-regret.ghost.io)

Omicron Post #9

ZviDec 23, 2021, 9:50 PM
89 points
11 comments19 min readLW link
(thezvi.wordpress.com)

Re­ply to Eliezer on Biolog­i­cal Anchors

HoldenKarnofskyDec 23, 2021, 4:15 PM
149 points
46 comments15 min readLW link

Get Set, Also Go

ZviDec 23, 2021, 3:00 PM
62 points
21 comments16 min readLW link
(thezvi.wordpress.com)

2021 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 23, 2021, 2:06 PM
168 points
28 comments73 min readLW link

Test­ing, Test­ing, Hopefully

ZviDec 23, 2021, 12:30 PM
41 points
8 comments4 min readLW link
(thezvi.wordpress.com)

Physics Erotica

lsusrDec 23, 2021, 11:01 AM
7 points
12 comments1 min readLW link

[Book Re­view] “The Most Pow­er­ful Idea in the World” by William Rosen

lsusrDec 23, 2021, 8:27 AM
41 points
4 comments8 min readLW link

Specialization

DirectedEvolutionDec 23, 2021, 3:23 AM
13 points
1 comment5 min readLW link

Worst-case think­ing in AI alignment

BuckDec 23, 2021, 1:29 AM
166 points
18 comments6 min readLW link2 reviews

[Question] Hedg­ing the Pos­si­bil­ity of Rus­sia in­vad­ing Ukraine

AnnapurnaDec 23, 2021, 1:13 AM
27 points
8 comments1 min readLW link

Gifts

George3d6Dec 22, 2021, 11:50 PM
13 points
1 comment9 min readLW link
(www.epistem.ink)

A spread­sheet/​tem­plate for do­ing an an­nual review

peterslatteryDec 22, 2021, 11:29 PM
12 points
1 comment2 min readLW link

[Question] What time in your life were you the most pro­duc­tive at learn­ing and/​or think­ing and why?

Jack RDec 22, 2021, 10:56 PM
11 points
2 comments1 min readLW link

Trans­former Circuits

evhubDec 22, 2021, 9:09 PM
144 points
4 comments3 min readLW link
(transformer-circuits.pub)

[Question] Help figur­ing out my sex­u­al­ity?

CenthartDec 22, 2021, 8:28 PM
13 points
13 comments2 min readLW link

DnD.Sci GURPS Eval­u­a­tion and Ruleset

J BostockDec 22, 2021, 7:05 PM
17 points
2 comments6 min readLW link

Po­ten­tial gears level ex­pla­na­tions of smooth progress

ryan_greenblattDec 22, 2021, 6:05 PM
4 points
2 comments2 min readLW link

Ran­dom facts can come back to bite you

tailcalledDec 22, 2021, 5:33 PM
69 points
7 comments2 min readLW link1 review

What’s Up With the CDC Now­cast?

ZviDec 22, 2021, 1:00 PM
61 points
4 comments5 min readLW link
(thezvi.wordpress.com)

Mo­ral­ity and con­strained max­i­miza­tion, part 1

Joe CarlsmithDec 22, 2021, 8:47 AM
20 points
5 comments11 min readLW link

Six Spe­cial­iza­tions Makes You World-Class

lsusrDec 22, 2021, 8:03 AM
53 points
23 comments1 min readLW link

Wor­ld­build­ing ex­er­cise: The High­way­verse.

Yair HalberstadtDec 22, 2021, 6:47 AM
13 points
13 comments11 min readLW link

Two (very differ­ent) kinds of donors

Duncan Sabien (Deactivated)Dec 22, 2021, 1:43 AM
101 points
19 comments3 min readLW link

[Question] Con­fu­sion about Se­quences and Re­view Sequences

Alex_AltairDec 21, 2021, 6:13 PM
14 points
3 comments1 min readLW link

Work­ing through D&D.Sci, prob­lem 1 (solu­tion)

Pablo RepettoDec 21, 2021, 5:42 PM
9 points
2 comments1 min readLW link
(pabloernesto.github.io)

De­mand­ing and De­sign­ing Aligned Cog­ni­tive Architectures

Koen.HoltmanDec 21, 2021, 5:32 PM
8 points
5 comments5 min readLW link

Ex­pe­riences rais­ing chil­dren in shared housing

juliawiseDec 21, 2021, 5:09 PM
117 points
4 comments6 min readLW link

[Question] What ques­tions do you have about do­ing work on AI safety?

peterbarnettDec 21, 2021, 4:36 PM
13 points
8 comments1 min readLW link

Per­pet­ual Dick­en­sian Poverty?

jefftkDec 21, 2021, 1:30 PM
119 points
18 comments1 min readLW link
(www.jefftk.com)

On (Not) Read­ing Papers

JanDec 21, 2021, 9:57 AM
53 points
10 comments7 min readLW link
(universalprior.substack.com)

Quick Poll: Booster Reactions

ElizabethDec 21, 2021, 7:40 AM
40 points
2 comments2 min readLW link
(acesounderglass.com)

Book Launch: The Eng­ines of Cognition

Ben PaceDec 21, 2021, 7:24 AM
174 points
56 comments5 min readLW link

Re­searcher in­cen­tives cause smoother progress on bench­marks

ryan_greenblattDec 21, 2021, 4:13 AM
20 points
4 comments1 min readLW link

Omicron Post #8

ZviDec 20, 2021, 11:10 PM
96 points
33 comments16 min readLW link
(thezvi.wordpress.com)

[Question] Good com­plete views on motivation

ValdesDec 20, 2021, 10:10 PM
6 points
4 comments1 min readLW link

Prizes for last year’s 2019 Review

RaemonDec 20, 2021, 9:58 PM
40 points
0 comments3 min readLW link

Omicron Paths

jefftkDec 20, 2021, 6:30 PM
14 points
8 comments2 min readLW link
(www.jefftk.com)

[Question] Is there a term /​ bet­ter way of phras­ing the gen­eral case where an in­ter­ven­tion helps cer­tain in­di­vi­d­u­als do bet­ter at zero-sum games but doesn’t provide any ex­ter­nal value?

freedomandutilityDec 20, 2021, 5:35 PM
4 points
8 comments1 min readLW link

Bayesian Dharani, Great Dharani for Con­serv­ing Evidence

Gordon Seidoh WorleyDec 20, 2021, 4:32 PM
9 points
5 comments1 min readLW link