Re­ward is not the op­ti­miza­tion target

TurnTroutJul 25, 2022, 12:03 AM
375 points
123 comments10 min readLW link3 reviews

Without spe­cific coun­ter­mea­sures, the eas­iest path to trans­for­ma­tive AI likely leads to AI takeover

Ajeya CotraJul 18, 2022, 7:06 PM
368 points
95 comments75 min readLW link1 review

What should you change in re­sponse to an “emer­gency”? And AI risk

AnnaSalamonJul 18, 2022, 1:11 AM
338 points
60 comments6 min readLW link1 review

Look­ing back on my al­ign­ment PhD

TurnTroutJul 1, 2022, 3:19 AM
334 points
66 comments11 min readLW link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8resJul 12, 2022, 2:49 AM
313 points
89 comments29 min readLW link3 reviews

Toni Kurz and the In­san­ity of Climb­ing Mountains

GeneSmithJul 3, 2022, 8:51 PM
271 points
67 comments11 min readLW link2 reviews

Chang­ing the world through slack & hobbies

Steven ByrnesJul 21, 2022, 6:11 PM
261 points
13 comments10 min readLW link

Safetywashing

Adam SchollJul 1, 2022, 11:56 AM
260 points
20 comments1 min readLW link2 reviews

Sex­ual Abuse at­ti­tudes might be infohazardous

Pseudonymous OtterJul 19, 2022, 6:06 PM
256 points
72 comments1 min readLW link

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

Jul 14, 2022, 2:31 AM
211 points
94 comments9 min readLW link1 review

Unify­ing Bar­gain­ing No­tions (1/​2)

DiffractorJul 25, 2022, 12:28 AM
210 points
41 comments16 min readLW link

A note about differ­en­tial tech­nolog­i­cal development

So8resJul 15, 2022, 4:46 AM
197 points
33 comments6 min readLW link

Con­nor Leahy on Dy­ing with Dig­nity, EleutherAI and Conjecture

Michaël TrazziJul 22, 2022, 6:44 PM
195 points
29 comments14 min readLW link
(theinsideview.ai)

AGI ruin sce­nar­ios are likely (and dis­junc­tive)

So8resJul 27, 2022, 3:21 AM
175 points
38 comments6 min readLW link

ITT-pass­ing and ci­vil­ity are good; “char­ity” is bad; steel­man­ning is niche

Rob BensingerJul 5, 2022, 12:15 AM
163 points
36 comments6 min readLW link1 review

«Boundaries», Part 1: a key miss­ing con­cept from util­ity theory

Andrew_CritchJul 26, 2022, 11:03 PM
158 points
33 comments7 min readLW link

Re­solve Cycles

CFAR!DuncanJul 16, 2022, 11:17 PM
140 points
8 comments10 min readLW link

Car­ry­ing the Torch: A Re­sponse to Anna Sala­mon by the Guild of the Rose

moridinamaelJul 6, 2022, 2:20 PM
136 points
16 comments6 min readLW link

Brain­storm of things that could force an AI team to burn their lead

So8resJul 24, 2022, 11:58 PM
134 points
8 comments13 min readLW link

AI Fore­cast­ing: One Year In

jsteinhardtJul 4, 2022, 5:10 AM
132 points
12 comments6 min readLW link
(bounded-regret.ghost.io)

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

Jul 29, 2022, 7:07 PM
131 points
6 comments19 min readLW link

Limer­ence Messes Up Your Ra­tion­al­ity Real Bad, Yo

RaemonJul 1, 2022, 4:53 PM
128 points
41 comments3 min readLW link2 reviews

Prin­ci­ples for Align­ment/​Agency Projects

johnswentworthJul 7, 2022, 2:07 AM
122 points
20 comments4 min readLW link

Unify­ing Bar­gain­ing No­tions (2/​2)

DiffractorJul 27, 2022, 3:40 AM
118 points
19 comments21 min readLW link

Focusing

CFAR!DuncanJul 29, 2022, 7:15 PM
114 points
23 comments14 min readLW link

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee SharkeyJul 14, 2022, 4:59 PM
114 points
15 comments33 min readLW link

Mo­ral strate­gies at differ­ent ca­pa­bil­ity levels

Richard_NgoJul 27, 2022, 6:50 PM
112 points
14 comments5 min readLW link
(thinkingcomplete.blogspot.com)

Crit­i­cism of EA Crit­i­cism Contest

ZviJul 14, 2022, 2:30 PM
108 points
17 comments31 min readLW link1 review
(thezvi.wordpress.com)

Ex­am­ples of AI In­creas­ing AI Progress

TW123Jul 17, 2022, 8:06 PM
107 points
14 comments1 min readLW link

Safety Im­pli­ca­tions of LeCun’s path to ma­chine intelligence

Ivan VendrovJul 15, 2022, 9:47 PM
102 points
18 comments6 min readLW link

Com­ment on “Propo­si­tions Con­cern­ing Digi­tal Minds and So­ciety”

Zack_M_DavisJul 10, 2022, 5:48 AM
99 points
12 comments8 min readLW link

Mar­riage, the Giv­ing What We Can Pledge, and the dam­age caused by vague pub­lic commitments

Jeffrey LadishJul 11, 2022, 7:38 PM
98 points
27 comments6 min readLW link1 review

Naive Hy­pothe­ses on AI Alignment

Shoshannah TekofskyJul 2, 2022, 7:03 PM
98 points
29 comments5 min readLW link

A sum­mary of ev­ery “High­lights from the Se­quences” post

Orpheus16Jul 15, 2022, 11:01 PM
97 points
7 comments17 min readLW link

Open­ing Ses­sion Tips & Advice

CFAR!DuncanJul 25, 2022, 3:57 AM
95 points
3 comments14 min readLW link1 review

Help ARC eval­u­ate ca­pa­bil­ities of cur­rent lan­guage mod­els (still need peo­ple)

Beth BarnesJul 19, 2022, 4:55 AM
95 points
6 comments2 min readLW link

MATS Models

johnswentworthJul 9, 2022, 12:14 AM
94 points
5 comments16 min readLW link

Hu­man val­ues & bi­ases are in­ac­cessible to the genome

TurnTroutJul 7, 2022, 5:29 PM
94 points
54 comments6 min readLW link1 review

In­ter­nal Dou­ble Crux

CFAR!DuncanJul 22, 2022, 4:34 AM
93 points
15 comments12 min readLW link

Im­manuel Kant and the De­ci­sion The­ory App Store

Daniel KokotajloJul 10, 2022, 4:04 PM
92 points
12 comments5 min readLW link

Goal Factoring

CFAR!DuncanJul 5, 2022, 7:10 AM
92 points
2 comments8 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimiJul 20, 2022, 10:44 AM
87 points
11 comments8 min readLW link

Don’t use ‘in­fo­haz­ard’ for col­lec­tively de­struc­tive info

Eliezer YudkowskyJul 15, 2022, 5:13 AM
86 points
33 comments1 min readLW link2 reviews
(www.facebook.com)

Trig­ger-Ac­tion Planning

CFAR!DuncanJul 3, 2022, 1:42 AM
86 points
14 comments13 min readLW link2 reviews

Trends in GPU price-performance

Jul 1, 2022, 3:51 PM
85 points
13 comments1 min readLW link1 review
(epochai.org)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

Jul 16, 2022, 12:57 PM
84 points
132 comments3 min readLW link

Bench­mark for suc­cess­ful con­cept ex­trap­o­la­tion/​avoid­ing goal misgeneralization

Stuart_ArmstrongJul 4, 2022, 8:48 PM
82 points
12 comments4 min readLW link

Ad­den­dum: A non-mag­i­cal ex­pla­na­tion of Jeffrey Epstein

lcJul 18, 2022, 5:40 PM
81 points
21 comments11 min readLW link

De­ci­sion the­ory and dy­namic inconsistency

paulfchristianoJul 3, 2022, 10:20 PM
80 points
33 comments10 min readLW link
(sideways-view.com)

[Question] How do AI timelines af­fect how you live your life?

Quadratic ReciprocityJul 11, 2022, 1:54 PM
80 points
50 comments1 min readLW link