Im­mor­tal­ity or death by AGI

ImmortalityOrDeathByAGI21 Sep 2023 23:59 UTC
47 points
30 comments4 min readLW link
(forum.effectivealtruism.org)

Neel Nanda on the Mechanis­tic In­ter­pretabil­ity Re­searcher Mindset

Michaël Trazzi21 Sep 2023 19:47 UTC
37 points
1 comment3 min readLW link
(theinsideview.ai)

Re­quire AGI to be Explainable

PeterMcCluskey21 Sep 2023 16:11 UTC
5 points
0 comments6 min readLW link
(bayesianinvestor.com)

Up­date to “Dom­i­nant As­surance Con­tract Plat­form”

moyamo21 Sep 2023 16:09 UTC
32 points
1 comment1 min readLW link

Sparse Au­toen­coders: Fu­ture Work

21 Sep 2023 15:30 UTC
35 points
5 comments6 min readLW link

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

21 Sep 2023 15:30 UTC
159 points
8 comments5 min readLW link

There should be more AI safety orgs

Marius Hobbhahn21 Sep 2023 14:53 UTC
181 points
25 comments17 min readLW link

Ward 5: Jack Perenick and Naima Sait

jefftk21 Sep 2023 13:00 UTC
24 points
0 comments1 min readLW link
(www.jefftk.com)

AI #30: Dalle-3 and GPT-3.5-In­struct-Turbo

Zvi21 Sep 2023 12:00 UTC
75 points
8 comments47 min readLW link
(thezvi.wordpress.com)

[Question] How are ra­tio­nal­ists or orgs blocked, that you can see?

Nathan Young21 Sep 2023 2:37 UTC
7 points
2 comments1 min readLW link

Vi­sion Week­end US Edition

Allison Duettmann20 Sep 2023 21:28 UTC
4 points
0 comments1 min readLW link

Fore­sight Vi­sion Week­end Europe Edition

Allison Duettmann20 Sep 2023 21:25 UTC
3 points
0 comments1 min readLW link

Notes on ChatGPT’s “mem­ory” for strings and for events

Bill Benzon20 Sep 2023 18:12 UTC
3 points
0 comments10 min readLW link

Belief and the Truth

Sam I am20 Sep 2023 17:38 UTC
2 points
14 comments5 min readLW link
(open.substack.com)

Image Hi­jacks: Ad­ver­sar­ial Images can Con­trol Gen­er­a­tive Models at Runtime

20 Sep 2023 15:23 UTC
58 points
9 comments1 min readLW link
(arxiv.org)

In­ter­pretabil­ity Ex­ter­nal­ities Case Study—Hun­gry Hun­gry Hippos

Magdalena Wache20 Sep 2023 14:42 UTC
64 points
22 comments2 min readLW link

An Ele­men­tary In­tro­duc­tion to In­fra-Bayesianism

CharlesRW20 Sep 2023 14:29 UTC
16 points
0 comments1 min readLW link

Weekly In­ci­dence In­clud­ing Delay

jefftk20 Sep 2023 14:00 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

[Question] The stereo­type of male clas­si­cal mu­sic lovers be­ing gay

BB620 Sep 2023 13:23 UTC
11 points
6 comments1 min readLW link

Hous­ing Roundup #6

Zvi20 Sep 2023 13:10 UTC
27 points
8 comments14 min readLW link
(thezvi.wordpress.com)

Care­less talk on US-China AI com­pe­ti­tion? (and crit­i­cism of CAIS cov­er­age)

Oliver Sourbut20 Sep 2023 12:46 UTC
9 points
2 comments10 min readLW link2 reviews
(www.oliversourbut.net)

A New Bayesian De­ci­sion Theory

Pareto Optimal20 Sep 2023 9:36 UTC
−6 points
0 comments1 min readLW link
(paretooptimal.substack.com)

Protest against Meta’s ir­re­versible pro­lifer­a­tion (Sept 29, San Fran­cisco)

Holly_Elmore19 Sep 2023 23:40 UTC
54 points
33 comments1 min readLW link

The AI Ex­plo­sion Might Never Happen

snewman19 Sep 2023 23:20 UTC
21 points
31 comments9 min readLW link

Science of Deep Learn­ing more tractably ad­dresses the Sharp Left Turn than Agent Foundations

NickGabs19 Sep 2023 22:06 UTC
23 points
2 comments6 min readLW link

For­mal­iz­ing «Boundaries» with Markov blankets

Chipmonk19 Sep 2023 21:01 UTC
21 points
20 comments3 min readLW link

Pre­ci­sion of Sets of Forecasts

niplav19 Sep 2023 18:19 UTC
20 points
5 comments10 min readLW link

The Proxy Poli­ti­cal Party

antidefault19 Sep 2023 17:47 UTC
−2 points
4 comments1 min readLW link
(antidefault.net)

The Limits of the Ex­is­tence Proof Ar­gu­ment for Gen­eral Intelligence

Amadeus Pagel19 Sep 2023 17:45 UTC
−21 points
3 comments1 min readLW link
(amadeuspagel.com)

[Question] Is there a pub­li­cly available list of ex­am­ples of fron­tier model ca­pa­bil­ities?

Max Kearney19 Sep 2023 17:45 UTC
1 point
0 comments1 min readLW link

Tal­linn, Es­to­nia – ACX Mee­tups Every­where Au­tumn 2023

Andrew19 Sep 2023 16:24 UTC
1 point
0 comments1 min readLW link

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-Dodds19 Sep 2023 15:09 UTC
83 points
23 comments3 min readLW link
(www.anthropic.com)

AISN #22: The Land­scape of US AI Leg­is­la­tion - Hear­ings, Frame­works, Bills, and Laws

19 Sep 2023 14:44 UTC
20 points
0 comments5 min readLW link
(newsletter.safe.ai)

Com­pila­tion of Profit for Good Redteam­ing and Responses

Brad West 19 Sep 2023 13:34 UTC
1 point
0 comments9 min readLW link

[Link post] Michael Niel­sen’s “Notes on Ex­is­ten­tial Risk from Ar­tifi­cial Su­per­in­tel­li­gence”

Joel Becker19 Sep 2023 13:31 UTC
67 points
12 comments1 min readLW link
(michaelnotebook.com)

[Question] Do LLMs Im­ple­ment NLP Al­gorithms for Bet­ter Next To­ken Pre­dic­tions?

simeon_c19 Sep 2023 12:28 UTC
5 points
1 comment1 min readLW link

On martingales

Joey Marcellino19 Sep 2023 11:39 UTC
8 points
4 comments4 min readLW link

Luck based medicine: an­gry el­dritch sugar gods edition

Elizabeth19 Sep 2023 4:40 UTC
75 points
14 comments9 min readLW link
(acesounderglass.com)

Don’t Think About the Thing Be­hind the Cur­tain.

keltan19 Sep 2023 2:07 UTC
4 points
0 comments5 min readLW link

Panel with Is­raeli Prime Minister on ex­is­ten­tial risk from AI

Michaël Trazzi18 Sep 2023 23:16 UTC
22 points
2 comments1 min readLW link
(x.com)

Some rea­sons why I fre­quently pre­fer com­mu­ni­cat­ing via text

Adam Zerner18 Sep 2023 21:50 UTC
51 points
18 comments2 min readLW link

Why I Don’t Believe The Law of the Ex­cluded Middle

Thoth Hermes18 Sep 2023 18:53 UTC
−10 points
46 comments5 min readLW link
(thothhermes.substack.com)

Fore­cast­ing for Policy (FORPOL) - Main take­aways, prac­ti­cal learn­ings & report

janklenha18 Sep 2023 17:44 UTC
2 points
0 comments4 min readLW link

The Talk: a brief ex­pla­na­tion of sex­ual dimorphism

Malmesbury18 Sep 2023 16:23 UTC
501 points
73 comments16 min readLW link1 review

[Question] Where might I di­rect promis­ing-to-me re­searchers to ap­ply for al­ign­ment jobs/​grants?

abramdemski18 Sep 2023 16:20 UTC
45 points
10 comments1 min readLW link

[Re­view] Move First, Think Later: Sense and Non­sense in Im­prov­ing Your Chess

Arjun Panickssery18 Sep 2023 15:10 UTC
33 points
2 comments6 min readLW link
(arjunpanickssery.substack.com)

Tech­ni­cal AI Safety Re­search Land­scape [Slides]

Magdalena Wache18 Sep 2023 13:56 UTC
41 points
0 comments4 min readLW link

The om­ni­zoid—Heighn FDT De­bate #5

Heighn18 Sep 2023 11:54 UTC
4 points
0 comments3 min readLW link

Ask for Feel­ings not Tunes

jefftk18 Sep 2023 2:10 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

Three ways in­ter­pretabil­ity could be impactful

Arthur Conmy18 Sep 2023 1:02 UTC
47 points
8 comments4 min readLW link