What failure looks like

paulfchristiano17 Mar 2019 20:18 UTC
416 points
54 comments8 min readLW link2 reviews

Align­ment Re­search Field Guide

abramdemski8 Mar 2019 19:57 UTC
268 points
9 comments17 min readLW link2 reviews

You Get About Five Words

Raemon12 Mar 2019 20:30 UTC
223 points
80 comments1 min readLW link6 reviews

Rest Days vs Re­cov­ery Days

Unreal19 Mar 2019 22:37 UTC
214 points
36 comments6 min readLW link1 review

Per­son­al­ized Medicine For Real

sarahconstantin4 Mar 2019 22:40 UTC
214 points
16 comments5 min readLW link
(srconstantin.wordpress.com)

Subagents, akra­sia, and co­her­ence in humans

Kaj_Sotala25 Mar 2019 14:24 UTC
138 points
31 comments16 min readLW link

The Amish, and Strate­gic Norms around Technology

Raemon24 Mar 2019 22:16 UTC
138 points
18 comments3 min readLW link2 reviews

The Main Sources of AI Risk?

21 Mar 2019 18:28 UTC
121 points
26 comments2 min readLW link

Subagents, in­tro­spec­tive aware­ness, and blending

Kaj_Sotala2 Mar 2019 12:53 UTC
110 points
19 comments9 min readLW link

Karma-Change Notifications

jimrandomh2 Mar 2019 2:52 UTC
92 points
44 comments1 min readLW link

What I’ve Learned From My Par­ents’ Ar­ranged Marriage

squidious26 Mar 2019 6:40 UTC
92 points
16 comments5 min readLW link
(opalsandbonobos.blogspot.com)

mAIry’s room: AI rea­son­ing to solve philo­soph­i­cal problems

Stuart_Armstrong5 Mar 2019 20:24 UTC
87 points
41 comments6 min readLW link2 reviews

Plans are Re­cur­sive & Why This is Important

Ruby10 Mar 2019 1:58 UTC
80 points
11 comments10 min readLW link

Privacy

Zvi15 Mar 2019 20:20 UTC
79 points
78 comments6 min readLW link
(thezvi.wordpress.com)

Com­par­i­son of de­ci­sion the­o­ries (with a fo­cus on log­i­cal-coun­ter­fac­tual de­ci­sion the­o­ries)

riceissa16 Mar 2019 21:15 UTC
78 points
11 comments10 min readLW link

Ac­tive Cu­ri­os­ity vs Open Curiosity

Unreal15 Mar 2019 16:54 UTC
76 points
24 comments3 min readLW link

Dependability

Unreal26 Mar 2019 22:49 UTC
75 points
39 comments8 min readLW link

In My Culture

Duncan Sabien (Deactivated)7 Mar 2019 7:22 UTC
68 points
59 comments24 min readLW link2 reviews
(medium.com)

Three ways that “Suffi­ciently op­ti­mized agents ap­pear co­her­ent” can be false

Wei Dai5 Mar 2019 21:52 UTC
65 points
3 comments3 min readLW link

Boe­ing 737 MAX MCAS as an agent cor­rigi­bil­ity failure

Shmi16 Mar 2019 1:46 UTC
60 points
3 comments1 min readLW link

Declar­a­tive Mathematics

johnswentworth21 Mar 2019 19:05 UTC
59 points
10 comments3 min readLW link

How to Un­der­stand and Miti­gate Risk

Matt Goldenberg12 Mar 2019 10:14 UTC
55 points
30 comments16 min readLW link

Do you like bul­let points?

Raemon26 Mar 2019 4:30 UTC
52 points
38 comments2 min readLW link

[Question] Un­der­stand­ing in­for­ma­tion cascades

13 Mar 2019 10:55 UTC
50 points
42 comments3 min readLW link

Mo­ti­va­tion: You Have to Win in the Mo­ment

Ruby1 Mar 2019 0:26 UTC
49 points
20 comments6 min readLW link

[Question] How much fund­ing and re­searchers were in AI, and AI Safety, in 2018?

Raemon3 Mar 2019 21:46 UTC
41 points
11 comments1 min readLW link

Re­nam­ing “Front­page”

Raemon9 Mar 2019 1:23 UTC
41 points
16 comments4 min readLW link

[Fic­tion] IO.SYS

DataPacRat10 Mar 2019 21:23 UTC
40 points
4 comments22 min readLW link

Parfit’s Es­cape (Filk)

Gordon Seidoh Worley29 Mar 2019 2:31 UTC
39 points
0 comments1 min readLW link

‘This Waifu Does Not Ex­ist’: 100,000 StyleGAN & GPT-2 samples

gwern1 Mar 2019 4:29 UTC
39 points
6 comments1 min readLW link
(www.thiswaifudoesnotexist.net)

Please use real names, es­pe­cially for Align­ment Fo­rum?

Wei Dai29 Mar 2019 2:54 UTC
39 points
14 comments1 min readLW link

Some thoughts af­ter read­ing Ar­tifi­cial In­tel­li­gence: A Modern Approach

swift_spiral19 Mar 2019 23:39 UTC
38 points
4 comments2 min readLW link

[Question] What would you need to be mo­ti­vated to an­swer “hard” LW ques­tions?

Raemon28 Mar 2019 20:07 UTC
38 points
37 comments3 min readLW link

[Question] Did the re­cent black­mail dis­cus­sion change your be­liefs?

Dagon24 Mar 2019 16:06 UTC
36 points
7 comments1 min readLW link

[Question] What’s wrong with these analo­gies for un­der­stand­ing In­formed Over­sight and IDA?

Wei Dai20 Mar 2019 9:11 UTC
35 points
3 comments1 min readLW link

How dan­ger­ous is it to ride a bi­cy­cle with­out a helmet?

habryka9 Mar 2019 2:58 UTC
34 points
30 comments4 min readLW link

Sim­plified prefer­ences needed; sim­plified prefer­ences sufficient

Stuart_Armstrong5 Mar 2019 19:39 UTC
33 points
6 comments3 min readLW link

[Question] What so­cieties have ever had le­gal or ac­cepted black­mail?

clone of saturn17 Mar 2019 9:16 UTC
33 points
23 comments1 min readLW link

Has “poli­tics is the mind-kil­ler” been a mind-kil­ler?

SonnieBailey17 Mar 2019 3:05 UTC
31 points
26 comments3 min readLW link

[Question] What are CAIS’ bold­est near/​medium-term pre­dic­tions?

jacobjacob28 Mar 2019 13:14 UTC
31 points
17 comments1 min readLW link

Find­ing the variables

Stuart_Armstrong4 Mar 2019 19:37 UTC
30 points
1 comment4 min readLW link

In­sights from Munkres’ Topology

Rafael Harth17 Mar 2019 16:52 UTC
30 points
0 comments14 min readLW link

Align­ment Newslet­ter #48

Rohin Shah11 Mar 2019 21:10 UTC
29 points
14 comments9 min readLW link
(mailchi.mp)

De­sign­ing agent in­cen­tives to avoid side effects

11 Mar 2019 20:55 UTC
29 points
0 comments2 min readLW link
(medium.com)

Hu­mans aren’t agents—what then for value learn­ing?

Charlie Steiner15 Mar 2019 22:01 UTC
28 points
14 comments3 min readLW link

[Question] Willing to share some words that changed your be­liefs/​be­hav­ior?

Duncan Sabien (Deactivated)23 Mar 2019 2:08 UTC
28 points
4 comments1 min readLW link

AI Safety Pr­ereq­ui­sites Course: Ba­sic ab­stract rep­re­sen­ta­tions of computation

RAISE13 Mar 2019 19:38 UTC
28 points
2 comments1 min readLW link

A cog­ni­tive in­ter­ven­tion for wrist pain

rmoehn17 Mar 2019 5:26 UTC
28 points
24 comments6 min readLW link

Book re­view: My Hid­den Chimp

Bucky4 Mar 2019 9:55 UTC
28 points
0 comments8 min readLW link

A the­ory of hu­man values

Stuart_Armstrong13 Mar 2019 15:22 UTC
28 points
13 comments7 min readLW link