An ML pa­per on data steal­ing pro­vides a con­struc­tion for “gra­di­ent hack­ing”

David Scott Krueger (formerly: capybaralet)30 Jul 2024 21:44 UTC
21 points
1 comment1 min readLW link
(arxiv.org)

Open Source Au­to­mated In­ter­pretabil­ity for Sparse Au­toen­coder Features

30 Jul 2024 21:11 UTC
67 points
1 comment13 min readLW link
(blog.eleuther.ai)

Cater­pillars and Philosophy

Zero Contradictions30 Jul 2024 20:54 UTC
2 points
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

François Chol­let on the limi­ta­tions of LLMs in reasoning

2PuNCheeZ30 Jul 2024 20:04 UTC
1 point
1 comment2 min readLW link
(x.com)

Against AI As An Ex­is­ten­tial Risk

Noah Birnbaum30 Jul 2024 19:10 UTC
6 points
13 comments1 min readLW link
(irrationalitycommunity.substack.com)

[Question] Is ob­jec­tive moral­ity self-defeat­ing?

dialectica30 Jul 2024 18:23 UTC
−4 points
3 comments2 min readLW link

Limi­ta­tions on the In­ter­pretabil­ity of Learned Fea­tures from Sparse Dic­tionary Learning

Tom Angsten30 Jul 2024 16:36 UTC
6 points
0 comments9 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

30 Jul 2024 16:22 UTC
192 points
43 comments12 min readLW link

In­ves­ti­gat­ing the Abil­ity of LLMs to Rec­og­nize Their Own Writing

30 Jul 2024 15:41 UTC
32 points
0 comments15 min readLW link

Can Gen­er­al­ized Ad­ver­sar­ial Test­ing En­able More Ri­gor­ous LLM Safety Evals?

scasper30 Jul 2024 14:57 UTC
25 points
0 comments4 min readLW link

RTFB: Cal­ifor­nia’s AB 3211

Zvi30 Jul 2024 13:10 UTC
62 points
2 comments11 min readLW link
(thezvi.wordpress.com)

If You Can Climb Up, You Can Climb Down

jefftk30 Jul 2024 0:00 UTC
34 points
9 comments1 min readLW link
(www.jefftk.com)

What is Mo­ral­ity?

Zero Contradictions29 Jul 2024 19:19 UTC
−1 points
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Arch-an­ar­chism and im­mor­tal­ity

Peter lawless 29 Jul 2024 18:10 UTC
−5 points
1 comment2 min readLW link

AI Safety Newslet­ter #39: Im­pli­ca­tions of a Trump Ad­minis­tra­tion for AI Policy Plus, Safety Engineering

29 Jul 2024 17:50 UTC
17 points
1 comment6 min readLW link
(newsletter.safe.ai)

New Blog Post Against AI Doom

Noah Birnbaum29 Jul 2024 17:21 UTC
1 point
5 comments1 min readLW link
(substack.com)

An In­ter­pretabil­ity Illu­sion from Pop­u­la­tion Statis­tics in Causal Analysis

Daniel Tan29 Jul 2024 14:50 UTC
9 points
3 comments1 min readLW link

[Question] How to­k­eniza­tion in­fluences prompt­ing?

Boris Kashirin29 Jul 2024 10:28 UTC
9 points
4 comments1 min readLW link

Un­der­stand­ing Po­si­tional Fea­tures in Layer 0 SAEs

29 Jul 2024 9:36 UTC
43 points
0 comments5 min readLW link

Pre­dic­tion Mar­kets Explained

Benjamin_Sturisky29 Jul 2024 8:02 UTC
1 point
0 comments9 min readLW link

San Fran­cisco ACX Meetup “First Satur­day”

Nate Sternberg29 Jul 2024 6:11 UTC
3 points
2 comments1 min readLW link

Rel­a­tivity The­ory for What the Fu­ture ‘You’ Is and Isn’t

FlorianH29 Jul 2024 2:01 UTC
7 points
48 comments4 min readLW link

Wittgen­stein and Word2vec: Cap­tur­ing Re­la­tional Mean­ing in Lan­guage and Thought

cleanwhiteroom28 Jul 2024 19:55 UTC
2 points
2 comments2 min readLW link

Mak­ing Beliefs Pay Rent

28 Jul 2024 17:59 UTC
7 points
2 comments1 min readLW link

This is already your sec­ond chance

Malmesbury28 Jul 2024 17:13 UTC
174 points
13 comments8 min readLW link

[Question] Has Eliezer pub­li­cly and satis­fac­to­rily re­sponded to at­tempted re­but­tals of the anal­ogy to evolu­tion?

kaler28 Jul 2024 12:23 UTC
10 points
14 comments1 min readLW link

Fam­ily and Society

Zero Contradictions28 Jul 2024 7:05 UTC
1 point
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

[Question] What is AI Safety’s line of re­treat?

Remmelt28 Jul 2024 5:43 UTC
12 points
12 comments1 min readLW link

AXRP Epi­sode 34 - AI Eval­u­a­tions with Beth Barnes

DanielFilan28 Jul 2024 3:30 UTC
23 points
0 comments69 min readLW link

Rats, Back a Candidate

Blake28 Jul 2024 3:19 UTC
−40 points
19 comments1 min readLW link

AI ex­is­ten­tial risk prob­a­bil­ities are too un­re­li­able to in­form policy

Oleg Trott28 Jul 2024 0:59 UTC
18 points
5 comments1 min readLW link
(www.aisnakeoil.com)

Idle Spec­u­la­tions on Pipeline Parallelism

DaemonicSigil27 Jul 2024 22:40 UTC
1 point
0 comments4 min readLW link
(pbement.com)

Re: An­thropic’s sug­gested SB-1047 amendments

RobertM27 Jul 2024 22:32 UTC
87 points
13 comments9 min readLW link
(www.documentcloud.org)

The prob­lem with psy­chol­ogy is that it has no the­ory.

Nicholas D.27 Jul 2024 19:36 UTC
2 points
7 comments4 min readLW link
(nicholasdecker.substack.com)

Bryan John­son and a search for healthy longevity

NancyLebovitz27 Jul 2024 15:28 UTC
18 points
17 comments1 min readLW link

What are match­ing mar­kets?

ohmurphy27 Jul 2024 15:05 UTC
12 points
0 comments8 min readLW link
(ohmurphy.substack.com)

Safety con­sul­ta­tions for AI lab employees

Zach Stein-Perlman27 Jul 2024 15:00 UTC
181 points
4 comments1 min readLW link

The Case Against UBI

Zero Contradictions27 Jul 2024 6:36 UTC
−1 points
2 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

Un­lock­ing Solu­tions—By Un­der­stand­ing Co­or­di­na­tion Problems

James Stephen Brown27 Jul 2024 4:52 UTC
54 points
4 comments5 min readLW link
(nonzerosum.games)

Utili­tar­i­anism and the re­place­abil­ity of de­sires and attachments

MichaelStJules27 Jul 2024 1:57 UTC
7 points
2 comments1 min readLW link

In­spired by: Failures in Kindness

X4vier27 Jul 2024 1:21 UTC
61 points
2 comments3 min readLW link

My Ex­pe­rience Us­ing Gam­ifi­ca­tion

Wyatt S26 Jul 2024 23:06 UTC
13 points
4 comments4 min readLW link

How the AI safety tech­ni­cal land­scape has changed in the last year, ac­cord­ing to some practitioners

tlevin26 Jul 2024 19:06 UTC
55 points
6 comments2 min readLW link

A Vi­sual Task that’s Hard for GPT-4o, but Doable for Pri­mary Schoolers

Lennart Finke26 Jul 2024 17:51 UTC
25 points
4 comments2 min readLW link

Unal­igned AI is com­ing re­gard­less.

verbalshadow26 Jul 2024 16:41 UTC
−15 points
3 comments2 min readLW link

In­dex of ra­tio­nal­ist groups in the Bay Area July 2024

26 Jul 2024 16:32 UTC
35 points
10 comments2 min readLW link

End Sin­gle Fam­ily Zon­ing by Over­turn­ing Eu­clid V Ambler

Maxwell Tabarrok26 Jul 2024 14:08 UTC
32 points
1 comment7 min readLW link
(www.maximum-progress.com)

Com­mon Uses of “Ac­cep­tance”

Yi-Yang26 Jul 2024 11:18 UTC
9 points
5 comments24 min readLW link

Univer­sal Ba­sic In­come and Poverty

Eliezer Yudkowsky26 Jul 2024 7:23 UTC
281 points
131 comments9 min readLW link

A Solomonoff In­duc­tor Walks Into a Bar: Schel­ling Points for Communication

26 Jul 2024 0:33 UTC
93 points
1 comment13 min readLW link