Aver­sion Factoring

CFAR!DuncanJul 7, 2022, 4:09 PM
79 points
1 comment8 min readLW link

A Pat­tern Lan­guage For Rationality

VaniverJul 5, 2022, 7:08 PM
75 points
14 comments15 min readLW link

Which val­ues are sta­ble un­der on­tol­ogy shifts?

Richard_NgoJul 23, 2022, 2:40 AM
74 points
48 comments3 min readLW link
(thinkingcomplete.blogspot.com)

Prin­ci­ples of Pri­vacy for Align­ment Research

johnswentworthJul 27, 2022, 7:53 PM
73 points
31 comments7 min readLW link

Ab­stract­ing The Hard­ness of Align­ment: Un­bounded Atomic Optimization

adamShimiJul 29, 2022, 6:59 PM
72 points
3 comments16 min readLW link

NeurIPS ML Safety Work­shop 2022

Dan HJul 26, 2022, 3:28 PM
72 points
2 comments1 min readLW link
(neurips2022.mlsafety.org)

A time-in­var­i­ant ver­sion of Laplace’s rule

Jul 15, 2022, 7:28 PM
72 points
13 comments17 min readLW link
(epochai.org)

Avoid the ab­bre­vi­a­tion “FLOPs” – use “FLOP” or “FLOP/​s” instead

Daniel_EthJul 10, 2022, 10:44 AM
70 points
13 comments1 min readLW link

Cog­ni­tive Risks of Ado­les­cent Binge Drinking

Jul 20, 2022, 9:10 PM
70 points
12 comments10 min readLW link
(acesounderglass.com)

Taste & Shaping

CFAR!DuncanJul 10, 2022, 5:50 AM
67 points
1 comment16 min readLW link

My vi­sion of a good fu­ture, part I

Jeffrey LadishJul 6, 2022, 1:23 AM
66 points
18 comments9 min readLW link

Cu­rat­ing “The Epistemic Se­quences” (list v.0.1)

Andrew_CritchJul 23, 2022, 10:17 PM
65 points
12 comments7 min readLW link

Ap­pli­ca­tions are open for CFAR work­shops in Prague this fall!

John SteidleyJul 19, 2022, 6:29 PM
64 points
3 comments2 min readLW link

What’s next for in­stru­men­tal ra­tio­nal­ity?

Andrew_CritchJul 23, 2022, 10:55 PM
63 points
7 comments1 min readLW link

In­tro­duc­ing the Fund for Align­ment Re­search (We’re Hiring!)

Jul 6, 2022, 2:07 AM
62 points
0 comments4 min readLW link

Re­sponse to Blake Richards: AGI, gen­er­al­ity, al­ign­ment, & loss functions

Steven ByrnesJul 12, 2022, 1:56 PM
62 points
9 comments15 min readLW link

My Most Likely Rea­son to Die Young is AI X-Risk

AISafetyIsNotLongtermistJul 4, 2022, 5:08 PM
61 points
24 comments4 min readLW link
(forum.effectivealtruism.org)

Con­di­tion­ing Gen­er­a­tive Models for Alignment

JozdienJul 18, 2022, 7:11 AM
59 points
8 comments20 min readLW link

Dou­ble Crux

CFAR!DuncanJul 24, 2022, 6:34 AM
58 points
9 comments11 min readLW link

When Giv­ing Peo­ple Money Doesn’t Help

ZviJul 7, 2022, 1:00 PM
58 points
12 comments10 min readLW link
(thezvi.wordpress.com)

A Bias Against Altruism

Lone PineJul 23, 2022, 8:44 PM
58 points
30 comments2 min readLW link

Deep learn­ing cur­ricu­lum for large lan­guage model alignment

Jacob_HiltonJul 13, 2022, 9:58 PM
57 points
3 comments1 min readLW link
(github.com)

The Reader’s Guide to Op­ti­mal Mone­tary Policy

Ege ErdilJul 25, 2022, 3:10 PM
57 points
10 comments14 min readLW link

De­cep­tion?! I ain’t got time for that!

Paul CologneseJul 18, 2022, 12:06 AM
55 points
5 comments13 min readLW link

[AN #172] Sorry for the long hi­a­tus!

Rohin ShahJul 5, 2022, 6:20 AM
54 points
0 comments3 min readLW link
(mailchi.mp)

Don’t take the or­ga­ni­za­tional chart literally

lcJul 21, 2022, 12:56 AM
54 points
21 comments4 min readLW link

Com­fort Zone Exploration

CFAR!DuncanJul 15, 2022, 9:18 PM
51 points
2 comments12 min readLW link

Outer vs in­ner mis­al­ign­ment: three framings

Richard_NgoJul 6, 2022, 7:46 PM
51 points
5 comments9 min readLW link

Pro­ce­du­ral Ex­ec­u­tive Func­tion, Part 1

DaystarEldJul 4, 2022, 6:51 PM
50 points
8 comments14 min readLW link
(daystareld.com)

Race Along Rashomon Ridge

Jul 7, 2022, 3:20 AM
50 points
15 comments8 min readLW link

Mak­ing de­ci­sions us­ing mul­ti­ple worldviews

Richard_NgoJul 13, 2022, 7:15 PM
50 points
10 comments11 min readLW link

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

Jul 12, 2022, 8:11 PM
50 points
0 comments1 min readLW link
(docs.google.com)

Re­port from a civ­i­liza­tional ob­server on Earth

owencbJul 9, 2022, 5:26 PM
49 points
12 comments6 min readLW link

An­nounc­ing the AI Safety Field Build­ing Hub, a new effort to provide AISFB pro­jects, men­tor­ship, and funding

Vael GatesJul 28, 2022, 9:29 PM
49 points
3 comments6 min readLW link

Po­tato diet: A post mortem and an an­swer to SMTM’s article

Épiphanie GédéonJul 14, 2022, 11:18 PM
48 points
34 comments16 min readLW link

The Align­ment Problem

lsusrJul 11, 2022, 3:03 AM
46 points
18 comments3 min readLW link

Babysit­ting as Par­ent­ing Trial?

jefftkJul 7, 2022, 1:20 PM
46 points
19 comments3 min readLW link
(www.jefftk.com)

The Most Im­por­tant Cen­tury: The Animation

Jul 24, 2022, 8:58 PM
46 points
2 comments20 min readLW link
(youtu.be)

Deon­tolog­i­cal Evil

lsusrJul 2, 2022, 6:57 AM
44 points
4 comments2 min readLW link

Tar­nished Guy who Puts a Num on it

Jacob FalkovichJul 6, 2022, 6:05 PM
44 points
11 comments4 min readLW link

Eaves­drop­ping on Aliens: A Data De­cod­ing Challenge

anonymousaisafetyJul 24, 2022, 4:35 AM
44 points
9 comments4 min readLW link

Goal Align­ment Is Ro­bust To the Sharp Left Turn

Thane RuthenisJul 13, 2022, 8:23 PM
43 points
16 comments4 min readLW link

Systemization

CFAR!DuncanJul 11, 2022, 6:39 PM
42 points
5 comments12 min readLW link

Bucket Errors

CFAR!DuncanJul 29, 2022, 6:50 PM
42 points
7 comments11 min readLW link

Safety con­sid­er­a­tions for on­line gen­er­a­tive modeling

Sam MarksJul 7, 2022, 6:31 PM
42 points
9 comments14 min readLW link

Ar­tifi­cial Sand­wich­ing: When can we test scal­able al­ign­ment pro­to­cols with­out hu­mans?

Sam BowmanJul 13, 2022, 9:14 PM
42 points
6 comments5 min readLW link

Meio­sis is all you need

MetacelsusJul 1, 2022, 7:39 AM
41 points
3 comments2 min readLW link
(denovo.substack.com)

The cu­ri­ous case of Pretty Good hu­man in­ner/​outer alignment

PavleMihaJul 5, 2022, 7:04 PM
41 points
45 comments4 min readLW link

QNR Prospects

PeterMcCluskeyJul 16, 2022, 2:03 AM
40 points
3 comments8 min readLW link
(www.bayesianinvestor.com)

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan HJul 2, 2022, 12:09 AM
40 points
0 comments1 min readLW link
(arxiv.org)