Mech In­terp Challenge: Septem­ber—De­ci­pher­ing the Ad­di­tion Model

CallumMcDougall13 Sep 2023 22:23 UTC
35 points
0 comments4 min readLW link

Linkpost for Jan Leike on Self-Exfiltration

Daniel Kokotajlo13 Sep 2023 21:23 UTC
59 points
1 comment2 min readLW link
(aligned.substack.com)

MLSN: #10 Ad­ver­sar­ial At­tacks Against Lan­guage and Vi­sion Models, Im­prov­ing LLM Hon­esty, and Trac­ing the In­fluence of LLM Train­ing Data

13 Sep 2023 18:03 UTC
15 points
1 comment5 min readLW link
(newsletter.mlsafety.org)

Ex­pand­ing the Scope of Superposition

Derek Larson13 Sep 2023 17:38 UTC
10 points
0 comments4 min readLW link

Con­tra Yud­kowsky on Epistemic Con­duct for Author Criticism

Zack_M_Davis13 Sep 2023 15:33 UTC
69 points
38 comments7 min readLW link

Ap­ply to lead a pro­ject dur­ing the next vir­tual AI Safety Camp

13 Sep 2023 13:29 UTC
19 points
0 comments5 min readLW link
(aisafety.camp)

Is AI Safety drop­ping the ball on pri­vacy?

markov13 Sep 2023 13:07 UTC
50 points
17 comments7 min readLW link

UDT shows that de­ci­sion the­ory is more puz­zling than ever

Wei Dai13 Sep 2023 12:26 UTC
206 points
55 comments1 min readLW link

[Question] Align­ment & Ca­pa­bil­ities: what’s the differ­ence?

johnhalstead13 Sep 2023 11:48 UTC
6 points
3 comments1 min readLW link

Duty to res­cue /​ Non-as­sis­tance à per­sonne en danger

Thomas Sepulchre13 Sep 2023 9:49 UTC
15 points
5 comments3 min readLW link

The Flow-Through Fallacy

Chris_Leong13 Sep 2023 4:28 UTC
21 points
7 comments1 min readLW link

Book re­view: The Im­por­tance of What We Care About (Harry G. Frank­furt)

David Gross13 Sep 2023 4:17 UTC
7 points
0 comments4 min readLW link

Padding the Corner

jefftk13 Sep 2023 1:30 UTC
32 points
4 comments1 min readLW link
(www.jefftk.com)

[Question] Should an un­der­grad avoid a ca­pa­bil­ities pro­ject?

Double12 Sep 2023 23:16 UTC
4 points
2 comments1 min readLW link

[Linkpost] Con­tra four-wheeled suit­cases, sort of

Gunnar_Zarncke12 Sep 2023 20:36 UTC
18 points
4 comments1 min readLW link
(dynomight.substack.com)

Seek­ing Feed­back on My Mechanis­tic In­ter­pretabil­ity Re­search Agenda

RGRGRG12 Sep 2023 18:45 UTC
3 points
1 comment3 min readLW link

Au­to­mat­i­cally find­ing fea­ture vec­tors in the OV cir­cuits of Trans­form­ers with­out us­ing probing

Jacob Dunefsky12 Sep 2023 17:38 UTC
15 points
2 comments29 min readLW link

Startup Roundup #1: Happy Demo Day

Zvi12 Sep 2023 13:20 UTC
38 points
5 comments15 min readLW link
(thezvi.wordpress.com)

[Question] Is there some­thing fun­da­men­tally wrong with the Uni­verse?

Caerulea-Lawrence12 Sep 2023 12:02 UTC
6 points
80 comments2 min readLW link

Stu­pidity is also hard

walkthroughwalls12 Sep 2023 2:45 UTC
−8 points
4 comments2 min readLW link

Ap­ple Cider Baklava

jefftk12 Sep 2023 2:10 UTC
15 points
0 comments1 min readLW link
(www.jefftk.com)

How use­ful is Cor­rigi­bil­ity?

martinkunev12 Sep 2023 0:05 UTC
11 points
4 comments5 min readLW link

Con­tra Heighn Con­tra Me Con­tra Func­tional De­ci­sion The­ory

omnizoid11 Sep 2023 19:49 UTC
−10 points
14 comments6 min readLW link

Ma­chine Evolution

11 Sep 2023 19:29 UTC
11 points
2 comments22 min readLW link

[Question] Is there a hard copy of the se­quences available any­where?

Cole Wyeth11 Sep 2023 19:01 UTC
3 points
1 comment1 min readLW link

Ama­zon KDP AI con­tent guidelines

ChristianKl11 Sep 2023 18:36 UTC
12 points
0 comments1 min readLW link

A Case for AI Safety via Law

JWJohnston11 Sep 2023 18:26 UTC
17 points
12 comments4 min readLW link

Erdős Prob­lems in Al­gorith­mic Probability

Aidan Rocke11 Sep 2023 16:44 UTC
13 points
4 comments2 min readLW link

PSA: The com­mu­nity is in Berkeley/​Oak­land, not “the Bay Area”

maia11 Sep 2023 15:59 UTC
104 points
7 comments1 min readLW link

A Bat and Ball made me Sad

Darren McKee11 Sep 2023 13:48 UTC
14 points
26 comments1 min readLW link

Fo­cus on the Hardest Part First

Johannes C. Mayer11 Sep 2023 7:53 UTC
42 points
13 comments1 min readLW link

The Promises and Pit­falls of Long-Term Forecasting

GeoVane11 Sep 2023 5:04 UTC
1 point
0 comments5 min readLW link

Log­i­cal Share Splitting

DaemonicSigil11 Sep 2023 4:08 UTC
93 points
16 comments9 min readLW link
(pbement.com)

[Question] High school advice

Bohaska11 Sep 2023 1:26 UTC
11 points
16 comments1 min readLW link

Seat­tle As­tral Codex Ten Monthly Social

a7x10 Sep 2023 19:00 UTC
1 point
0 comments1 min readLW link

[Question] What are some good lan­guage mod­els to ex­per­i­ment with?

tailcalled10 Sep 2023 18:31 UTC
16 points
3 comments1 min readLW link

Play­ing the game vs. find­ing a cheat code

Metacelsus10 Sep 2023 18:11 UTC
32 points
1 comment3 min readLW link
(open.substack.com)

Cruxes on US lead for some do­mes­tic AI regulation

Zach Stein-Perlman10 Sep 2023 18:00 UTC
26 points
3 comments2 min readLW link

Us­ing Nega­tive Hal­lu­ci­na­tions to Man­age Sex­ual Desire

Johannes C. Mayer10 Sep 2023 11:56 UTC
−2 points
24 comments1 min readLW link

Fea­ture pro­posal: Ex­port ACX meetups

Viliam10 Sep 2023 10:50 UTC
11 points
7 comments1 min readLW link

Bet­ting and forecasting

CarlJ9 Sep 2023 20:03 UTC
2 points
0 comments1 min readLW link

AI pres­i­dents dis­cuss AI al­ign­ment agendas

9 Sep 2023 18:55 UTC
217 points
23 comments1 min readLW link
(www.youtube.com)

Prob­a­bil­is­tic ar­gu­ment re­la­tion­ships and an in­vi­ta­tion to the ar­gu­ment map­ping community

lunatic_at_large9 Sep 2023 18:45 UTC
13 points
4 comments10 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 8

9 Sep 2023 16:34 UTC
28 points
0 comments13 min readLW link

Panel dis­cus­sion on AI con­scious­ness with Rob Long and Jeff Sebo

Aaron Bergman9 Sep 2023 3:38 UTC
10 points
0 comments1 min readLW link
(www.youtube.com)

Pos­si­ble Diver­gence in AGI Risk Tol­er­ance be­tween Selfish and Altru­is­tic agents

Brad West 9 Sep 2023 0:23 UTC
1 point
1 comment2 min readLW link

Cap­ture the Flag Mechanis­tic In­ter­pretabil­ity Challenges

8 Sep 2023 23:00 UTC
24 points
0 comments7 min readLW link

[Question] What is to be done? (About the profit mo­tive)

Connor Barber8 Sep 2023 19:27 UTC
1 point
21 comments1 min readLW link

What is the op­ti­mal fron­tier for due dili­gence?

8 Sep 2023 18:20 UTC
41 points
1 comment1 min readLW link

Progress links di­gest, 2023-09-08: The Con­ser­va­tive Fu­tur­ist, cargo air­ships, and more

jasoncrawford8 Sep 2023 17:48 UTC
14 points
7 comments5 min readLW link
(rootsofprogress.org)