AI Safety Newslet­ter #7: Dis­in­for­ma­tion, Gover­nance Recom­men­da­tions for AI labs, and Se­nate Hear­ings on AI

23 May 2023 21:47 UTC
25 points
0 comments6 min readLW link
(newsletter.safe.ai)

The Po­lar­ity Prob­lem [Draft]

23 May 2023 21:05 UTC
24 points
3 comments44 min readLW link

Progress links and tweets, 2023-05-23

jasoncrawford23 May 2023 20:15 UTC
16 points
0 comments1 min readLW link
(rootsofprogress.org)

Yoshua Ben­gio: How Rogue AIs may Arise

harfe23 May 2023 18:28 UTC
92 points
12 comments18 min readLW link
(yoshuabengio.org)

‘Fun­da­men­tal’ vs ‘ap­plied’ mechanis­tic in­ter­pretabil­ity research

Lee Sharkey23 May 2023 18:26 UTC
65 points
6 comments3 min readLW link

Co­er­cion is an adap­ta­tion to scarcity; trust is an adap­ta­tion to abundance

Richard_Ngo23 May 2023 18:14 UTC
90 points
11 comments4 min readLW link

[Question] Is “brit­tle al­ign­ment” good enough?

the8thbit23 May 2023 17:35 UTC
9 points
5 comments3 min readLW link

Will Ar­tifi­cial Su­per­in­tel­li­gence Kill Us?

James_Miller23 May 2023 16:27 UTC
33 points
2 comments22 min readLW link

Phone Num­ber Jingle

jefftk23 May 2023 15:20 UTC
11 points
12 comments1 min readLW link
(www.jefftk.com)

GPT4 is ca­pa­ble of writ­ing de­cent long-form sci­ence fic­tion (with the right prompts)

RomanS23 May 2023 13:41 UTC
22 points
28 comments65 min readLW link

[Question] Do hu­mans still provide value in cor­re­spon­dence chess?

Jonathan Paulson23 May 2023 12:15 UTC
24 points
16 comments1 min readLW link

[Linkpost] The AGI Show podcast

Soroush Pour23 May 2023 9:52 UTC
4 points
0 comments1 min readLW link

Data and “to­kens” a 30 year old hu­man “trains” on

Jose Miguel Cruz y Celis23 May 2023 5:34 UTC
15 points
15 comments1 min readLW link

How I learned to stop wor­ry­ing and love skill trees

junk heap homotopy23 May 2023 4:08 UTC
81 points
3 comments1 min readLW link

T-Shirt Size Distribution

jefftk23 May 2023 2:40 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

AI self-im­prove­ment is possible

bhauth23 May 2023 2:32 UTC
18 points
3 comments8 min readLW link

Wor­ry­ing less about acausal extortion

Raemon23 May 2023 2:08 UTC
41 points
11 comments13 min readLW link

Self-lead­er­ship and self-love dis­solve anger and trauma

Richard_Ngo22 May 2023 22:30 UTC
70 points
7 comments5 min readLW link

A Man­i­fold mar­ket no­tice: Binance

Scrooge Mcduck22 May 2023 22:24 UTC
15 points
13 comments1 min readLW link

I don’t want to talk about AI

KirstenH22 May 2023 21:23 UTC
34 points
11 comments2 min readLW link
(ealifestyles.substack.com)

Ac­ti­va­tion ad­di­tions in a small resi­d­ual network

Garrett Baker22 May 2023 20:28 UTC
22 points
4 comments3 min readLW link

[Linkpost] “Gover­nance of su­per­in­tel­li­gence” by OpenAI

Daniel_Eth22 May 2023 20:15 UTC
67 points
20 comments1 min readLW link

Two Pie­ces of Ad­vice About How to Re­mem­ber Things

omnizoid22 May 2023 18:10 UTC
13 points
3 comments4 min readLW link

Why I Believe LLMs Do Not Have Hu­man-like Emotions

OneManyNone22 May 2023 15:46 UTC
13 points
6 comments7 min readLW link

AI Safety in China: Part 2

Lao Mein22 May 2023 14:50 UTC
95 points
28 comments2 min readLW link

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris Sala22 May 2023 14:31 UTC
155 points
5 comments3 min readLW link
(www.conjecture.dev)

Papers, Please #1: Var­i­ous Papers on Em­ploy­ment, Wages and Productivity

Zvi22 May 2023 12:00 UTC
42 points
2 comments8 min readLW link
(thezvi.wordpress.com)

In Defense of «The Army of Jakoths»

MikkW22 May 2023 11:59 UTC
−14 points
10 comments4 min readLW link

Speed of in­for­ma­tion in­put is a bot­tle­neck for rationality

MikkW22 May 2023 10:24 UTC
13 points
0 comments4 min readLW link

Distil­la­tion of Neu­rotech and Align­ment Work­shop Jan­uary 2023

22 May 2023 7:17 UTC
51 points
9 comments14 min readLW link

The Treach­er­ous Turn is finished! (AI-takeover-themed table­top RPG)

Daniel Kokotajlo22 May 2023 5:49 UTC
55 points
5 comments2 min readLW link
(thetreacherousturn.ai)

The Stan­ley Parable: Mak­ing philos­o­phy fun

Nathan112322 May 2023 2:15 UTC
6 points
3 comments3 min readLW link

Sea Monsters

Adam Zerner22 May 2023 0:58 UTC
28 points
11 comments5 min readLW link

The Army of Jakoths (a parable)

MikkW21 May 2023 22:48 UTC
−6 points
0 comments1 min readLW link

A&I (Rihanna ‘S&M’ par­ody lyrics)

nahoj21 May 2023 22:34 UTC
−2 points
0 comments2 min readLW link

Four Bat­tle­grounds: Power in the Age of Ar­tifi­cial In­tel­li­gence (Book re­view)

PeterMcCluskey21 May 2023 21:19 UTC
25 points
0 comments4 min readLW link
(bayesianinvestor.com)

Gen­der Vec­tors in ROME’s La­tent Space

Xodarap21 May 2023 18:46 UTC
14 points
2 comments3 min readLW link

Weight by Impact

Vaniver21 May 2023 14:37 UTC
29 points
1 comment3 min readLW link

[out­dated] My cur­rent the­ory of change to miti­gate ex­is­ten­tial risk by mis­al­igned ASI

mesaoptimizer21 May 2023 13:46 UTC
32 points
8 comments6 min readLW link
(mesaoptimizer.com)

Bab­ble on grow­ing trust

qbolec21 May 2023 13:19 UTC
13 points
1 comment5 min readLW link

Ele­va­tor Positioning

jefftk21 May 2023 11:30 UTC
15 points
1 comment1 min readLW link
(www.jefftk.com)

Trans­former Ar­chi­tec­ture Choice for Re­sist­ing Prompt In­jec­tion and Jail-Break­ing Attacks

RogerDearnaley21 May 2023 8:29 UTC
9 points
1 comment4 min readLW link

Jeff Clune ad­ver­tis­ing a post­doc on twit­ter...and ask­ing where he should tar­get his posts

Joyee Chen21 May 2023 1:02 UTC
4 points
0 comments1 min readLW link

Run­ning Sound for Yourself

jefftk20 May 2023 22:10 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

Job Open­ing: SWE to help build sig­na­ture vet­ting sys­tem for AI-re­lated petitions

20 May 2023 19:02 UTC
52 points
0 comments1 min readLW link

My Kind of Pragmatism

Nora Belrose20 May 2023 18:58 UTC
36 points
11 comments3 min readLW link

Colors Ap­pear To Have Al­most-Univer­sal Sym­bolic Associations

Thoth Hermes20 May 2023 18:40 UTC
−33 points
4 comments7 min readLW link
(thothhermes.substack.com)

Twiblings, four-par­ent ba­bies and other re­pro­duc­tive technology

GeneSmith20 May 2023 17:11 UTC
189 points
33 comments6 min readLW link

P-zom­bies, Com­pres­sion and the Si­mu­la­tion Hypothesis

RussellThor20 May 2023 11:36 UTC
5 points
0 comments5 min readLW link

The pos­si­ble shared Craft of de­liber­ate Lex­i­co­ge­n­e­sis

TsviBT20 May 2023 5:56 UTC
49 points
5 comments5 min readLW link