A Ten­ta­tive Timeline of The Near Fu­ture (2022-2025) for Self-Accountability

YitzDec 5, 2022, 5:33 AM
26 points
0 comments4 min readLW link

Nook Nature

Duncan Sabien (Deactivated)Dec 5, 2022, 4:10 AM
54 points
18 comments10 min readLW link

Prob­a­bly good pro­jects for the AI safety ecosystem

Ryan KiddDec 5, 2022, 2:26 AM
78 points
40 comments2 min readLW link

His­tor­i­cal Notes on Char­i­ta­ble Funds

jefftkDec 4, 2022, 11:30 PM
28 points
0 comments3 min readLW link
(www.jefftk.com)

AGI as a Black Swan Event

Stephen McAleeseDec 4, 2022, 11:00 PM
8 points
8 comments7 min readLW link

South Bay ACX/​LW Pre-Holi­day Get-Together

ISDec 4, 2022, 10:57 PM
10 points
0 comments1 min readLW link

ChatGPT is set­tling the Chi­nese Room argument

averrosDec 4, 2022, 8:25 PM
−7 points
7 comments1 min readLW link

Race to the Top: Bench­marks for AI Safety

Isabella DuanDec 4, 2022, 6:48 PM
29 points
6 comments1 min readLW link

Open & Wel­come Thread—De­cem­ber 2022

niplavDec 4, 2022, 3:06 PM
8 points
22 comments1 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. ParkDec 4, 2022, 12:17 PM
−15 points
4 commentsLW link

ChatGPT seems over­con­fi­dent to me

qbolecDec 4, 2022, 8:03 AM
19 points
3 comments16 min readLW link

Could an AI be Reli­gious?

mk54Dec 4, 2022, 5:00 AM
−12 points
14 comments1 min readLW link

Can GPT-3 Write Con­tra Dances?

jefftkDec 4, 2022, 3:00 AM
6 points
4 comments10 min readLW link
(www.jefftk.com)

Take 3: No in­de­scrib­able heav­en­wor­lds.

Charlie SteinerDec 4, 2022, 2:48 AM
23 points
12 comments2 min readLW link

Sum­mary of a new study on out-group hate (and how to fix it)

DirectedEvolutionDec 4, 2022, 1:53 AM
60 points
30 comments3 min readLW link
(www.pnas.org)

[Question] Will the first AGI agent have been de­signed as an agent (in ad­di­tion to an AGI)?

nahojDec 3, 2022, 8:32 PM
1 point
8 comments1 min readLW link

Log­i­cal in­duc­tion for soft­ware engineers

Alex FlintDec 3, 2022, 7:55 PM
163 points
8 comments27 min readLW link1 review

Utili­tar­i­anism is the only op­tion

aelwoodDec 3, 2022, 5:14 PM
−13 points
7 commentsLW link

Our 2022 Giving

jefftkDec 3, 2022, 3:40 PM
33 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] Is school good or bad?

tailcalledDec 3, 2022, 1:14 PM
10 points
76 comments1 min readLW link

MrBeast’s Squid Game Tricked Me

lsusrDec 3, 2022, 5:50 AM
75 points
1 comment2 min readLW link

Great Cry­on­ics Sur­vey of 2022

Mati_RoyDec 3, 2022, 5:10 AM
16 points
0 comments1 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

Dec 3, 2022, 12:59 AM
34 points
1 comment17 min readLW link

Causal scrub­bing: re­sults on a paren bal­ance checker

Dec 3, 2022, 12:59 AM
34 points
2 comments30 min readLW link

Causal scrub­bing: Appendix

Dec 3, 2022, 12:58 AM
18 points
4 comments20 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

Dec 3, 2022, 12:58 AM
206 points
35 comments20 min readLW link1 review

Take 2: Build­ing tools to help build FAI is a le­gi­t­i­mate strat­egy, but it’s dual-use.

Charlie Steiner3 Dec 2022 0:54 UTC
17 points
1 comment2 min readLW link

D&D.Sci De­cem­ber 2022: The Boojumologist

abstractapplic2 Dec 2022 23:39 UTC
32 points
9 comments2 min readLW link

Sub­sets and quo­tients in interpretability

Erik Jenner2 Dec 2022 23:13 UTC
26 points
1 comment7 min readLW link

Re­search Prin­ci­ples for 6 Months of AI Align­ment Studies

Shoshannah Tekofsky2 Dec 2022 22:55 UTC
23 points
3 comments6 min readLW link

Three Fables of Mag­i­cal Girls and Longtermism

Ulisse Mini2 Dec 2022 22:01 UTC
33 points
11 comments2 min readLW link

Brun’s the­o­rem and sieve theory

Ege Erdil2 Dec 2022 20:57 UTC
31 points
1 comment73 min readLW link

Ap­ply for the ML Up­skil­ling Win­ter Camp in Cam­bridge, UK [2-10 Jan]

hannah wing-yee2 Dec 2022 20:45 UTC
3 points
0 comments2 min readLW link

Take­off speeds, the chimps anal­ogy, and the Cul­tural In­tel­li­gence Hypothesis

NickGabs2 Dec 2022 19:14 UTC
16 points
2 comments4 min readLW link

[ASoT] Fine­tun­ing, RL, and GPT’s world prior

Jozdien2 Dec 2022 16:33 UTC
45 points
8 comments5 min readLW link

NeurIPS Safety & ChatGPT. MLAISU W48

2 Dec 2022 15:50 UTC
3 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

[Question] Is ChatGPT rigth when ad­vis­ing to brush the tongue when brush­ing teeth?

ChristianKl2 Dec 2022 14:53 UTC
13 points
14 comments2 min readLW link

Jailbreak­ing ChatGPT on Re­lease Day

Zvi2 Dec 2022 13:10 UTC
242 points
77 comments6 min readLW link1 review
(thezvi.wordpress.com)

De­con­fus­ing Direct vs Amor­tised Optimization

beren2 Dec 2022 11:30 UTC
134 points
19 comments10 min readLW link

In­ner and outer al­ign­ment de­com­pose one hard prob­lem into two ex­tremely hard problems

TurnTrout2 Dec 2022 2:43 UTC
149 points
22 comments47 min readLW link3 reviews

New Fea­ture: Col­lab­o­ra­tive edit­ing now sup­ports logged-out users

RobertM2 Dec 2022 2:41 UTC
10 points
0 comments1 min readLW link

Mas­ter­ing Strat­ego (Deep­mind)

svemirski2 Dec 2022 2:21 UTC
6 points
0 comments1 min readLW link
(www.deepmind.com)

Up­date on Har­vard AI Safety Team and MIT AI Alignment

2 Dec 2022 0:56 UTC
60 points
4 comments8 min readLW link

Quick look: cog­ni­tive dam­age from well-ad­ministered anesthesia

Elizabeth2 Dec 2022 0:40 UTC
28 points
0 comments4 min readLW link
(acesounderglass.com)

Against meta-eth­i­cal hedonism

Joe Carlsmith2 Dec 2022 0:23 UTC
24 points
5 comments35 min readLW link

Lu­me­na­tors for very lazy Bri­tish people

shakeelh2 Dec 2022 0:18 UTC
16 points
3 comments1 min readLW link

Un­der­stand­ing goals in com­plex systems

Johannes C. Mayer1 Dec 2022 23:49 UTC
9 points
0 comments1 min readLW link
(www.youtube.com)

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

1 Dec 2022 23:11 UTC
302 points
33 comments2 min readLW link

Play­ing with Ae­rial Photos

jefftk1 Dec 2022 22:50 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

Take 1: We’re not go­ing to re­verse-en­g­ineer the AI.

Charlie Steiner1 Dec 2022 22:41 UTC
38 points
4 comments4 min readLW link