Ran­somware Pay­ments Should Re­quire a Sin Tax

Brian Bien22 Jul 2024 21:16 UTC
20 points
10 comments2 min readLW link

The Elu­sive Root Cause of Schizophre­nia—Th­e­sis In­tro­duc­tion Only

kareempforbes22 Jul 2024 20:24 UTC
−9 points
0 comments2 min readLW link

Is Chi­nese AGI a valid con­cern for the USA?

sammyboiz22 Jul 2024 20:21 UTC
0 points
2 comments9 min readLW link

Try­ing to un­der­stand Han­son’s Cul­tural Drift argument

Kemp22 Jul 2024 20:20 UTC
9 points
3 comments2 min readLW link

Effi­cient Dic­tionary Learn­ing with Switch Sparse Autoencoders

Anish Mudide22 Jul 2024 18:45 UTC
118 points
19 comments12 min readLW link

An­a­lyz­ing Deep­Mind’s Prob­a­bil­is­tic Meth­ods for Eval­u­at­ing Agent Capabilities

22 Jul 2024 16:17 UTC
69 points
0 comments16 min readLW link

The Gar­den of Eden

Alexander Turok22 Jul 2024 16:07 UTC
23 points
2 comments9 min readLW link

Car­ing about excellence

owencb22 Jul 2024 14:24 UTC
47 points
4 comments1 min readLW link

Tim Dillon’s fake busi­ness is the most in­fluen­tial video I have watched in the last 24 months

Stuart Johnson22 Jul 2024 12:54 UTC
−4 points
0 comments1 min readLW link
(youtu.be)

On the CrowdStrike Incident

Zvi22 Jul 2024 12:40 UTC
75 points
14 comments17 min readLW link
(thezvi.wordpress.com)

Auto-En­hance: Devel­op­ing a meta-bench­mark to mea­sure LLM agents’ abil­ity to im­prove other agents

22 Jul 2024 12:33 UTC
20 points
0 comments14 min readLW link

What does “the uni­verse is quan­tum” ac­tu­ally mean?

Tahp22 Jul 2024 11:52 UTC
2 points
0 comments14 min readLW link

Ini­tial Ex­per­i­ments Us­ing SAEs to Help De­tect AI Gen­er­ated Text

Aaron_Scher22 Jul 2024 5:16 UTC
17 points
0 comments14 min readLW link

Cat­e­gories of lead­er­ship on tech­ni­cal teams

benkuhn22 Jul 2024 4:50 UTC
35 points
0 comments8 min readLW link
(www.benkuhn.net)

An ex­per­i­ment on hid­den cognition

Olli Järviniemi22 Jul 2024 3:26 UTC
25 points
2 comments7 min readLW link

OpenAI Boy­cott Revisit

Jake Dennie22 Jul 2024 1:44 UTC
17 points
2 comments2 min readLW link

Coal­i­tional agency

Richard_Ngo22 Jul 2024 0:09 UTC
56 points
6 comments6 min readLW link

The AI Driver’s Li­cence—A Policy Proposal

21 Jul 2024 20:38 UTC
0 points
1 comment19 min readLW link

De­mog­ra­phy and Destiny

Zero Contradictions21 Jul 2024 20:34 UTC
6 points
11 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

The $100B plan with “70% risk of kil­ling us all” w Stephen Fry [video]

Oleg Trott21 Jul 2024 20:06 UTC
34 points
8 comments1 min readLW link
(www.youtube.com)

Rais­ing Welfare for Lab Rodents

xanderbalwit21 Jul 2024 19:18 UTC
1 point
0 comments1 min readLW link
(press.asimov.com)

A sim­ple model of math skill

Alex_Altair21 Jul 2024 18:57 UTC
100 points
16 comments8 min readLW link

Us­ing an LLM per­plex­ity filter to de­tect weight exfiltration

Adam Karvonen21 Jul 2024 18:18 UTC
25 points
11 comments2 min readLW link

[Question] Would a scope-in­sen­si­tive AGI be less likely to in­ca­pac­i­tate hu­man­ity?

Jim Buhler21 Jul 2024 14:15 UTC
2 points
3 comments1 min readLW link

Holo­mor­phic sur­jec­tion the­o­rem (Pi­card’s lit­tle the­o­rem)

dkl921 Jul 2024 13:24 UTC
15 points
0 comments2 min readLW link
(dkl9.net)

aim­less ace an­a­lyzes ac­tive am­a­teur: a micro-aaaaal­ign­ment proposal

lemonhope21 Jul 2024 12:37 UTC
12 points
0 comments1 min readLW link

Pivotal Acts are eas­ier than Align­ment?

Michael Soareverix21 Jul 2024 12:15 UTC
1 point
4 comments1 min readLW link

Ball Sq Pathways

jefftk21 Jul 2024 2:20 UTC
13 points
1 comment1 min readLW link
(www.jefftk.com)

Free­dom and Pri­vacy of Thought Architectures

SebastianG 20 Jul 2024 21:43 UTC
5 points
2 comments1 min readLW link

In­tro­duc­tion to Modern Dat­ing: Strate­gic Dat­ing Ad­vice for be­gin­ners

Jesper Lindholm20 Jul 2024 15:45 UTC
5 points
6 comments13 min readLW link

Why Ge­or­gism Lost Its Popularity

Zero Contradictions20 Jul 2024 15:08 UTC
43 points
52 comments1 min readLW link
(zerocontradictions.net)

Only Fools Avoid Hind­sight Bias

Kevin Dorst20 Jul 2024 13:42 UTC
−11 points
5 comments6 min readLW link
(kevindorst.substack.com)

A more sys­tem­atic case for in­ner misalignment

Richard_Ngo20 Jul 2024 5:03 UTC
31 points
4 comments5 min readLW link

BatchTopK: A Sim­ple Im­prove­ment for TopK-SAEs

20 Jul 2024 2:20 UTC
52 points
0 comments4 min readLW link

Krona Compare

jefftk20 Jul 2024 1:10 UTC
10 points
0 comments2 min readLW link
(www.jefftk.com)

(Ap­prox­i­mately) Deter­minis­tic Nat­u­ral Latents

19 Jul 2024 23:02 UTC
41 points
0 comments4 min readLW link

Fea­ture Tar­geted LLC Es­ti­ma­tion Dist­in­guishes SAE Fea­tures from Ran­dom Directions

19 Jul 2024 20:32 UTC
59 points
6 comments16 min readLW link

JumpReLU SAEs + Early Ac­cess to Gemma 2 SAEs

19 Jul 2024 16:10 UTC
48 points
10 comments1 min readLW link
(storage.googleapis.com)

Truth is Univer­sal: Ro­bust De­tec­tion of Lies in LLMs

Lennart Buerger19 Jul 2024 14:07 UTC
24 points
3 comments2 min readLW link
(arxiv.org)

Sus­tain­abil­ity of Digi­tal Life Form Societies

Hiroshi Yamakawa19 Jul 2024 13:59 UTC
19 points
1 comment20 min readLW link

Ro­mae Industriae

Maxwell Tabarrok19 Jul 2024 13:03 UTC
34 points
2 comments7 min readLW link
(www.maximum-progress.com)

[Question] Have peo­ple given up on iter­ated dis­til­la­tion and am­plifi­ca­tion?

Chris_Leong19 Jul 2024 12:23 UTC
20 points
1 comment1 min readLW link

How do we know that “good re­search” is good? (aka “di­rect eval­u­a­tion” vs “eigen-eval­u­a­tion”)

Ruby19 Jul 2024 0:31 UTC
49 points
21 comments6 min readLW link

Linkpost: Surely you can be serious

kave18 Jul 2024 22:18 UTC
59 points
8 comments1 min readLW link
(www.experimental-history.com)

My ex­pe­rience ap­ply­ing to MATS 6.0

mic18 Jul 2024 19:02 UTC
16 points
3 comments5 min readLW link

[Question] What are the ac­tual ar­gu­ments in fa­vor of com­pu­ta­tion­al­ism as a the­ory of iden­tity?

sunwillrise18 Jul 2024 18:44 UTC
12 points
24 comments5 min readLW link

Yet Another Cri­tique of “Lux­ury Beliefs”

ymeskhout18 Jul 2024 18:37 UTC
6 points
10 comments9 min readLW link
(www.ymeskhout.com)

[In­terim re­search re­port] Eval­u­at­ing the Goal-Direct­ed­ness of Lan­guage Models

18 Jul 2024 18:19 UTC
39 points
4 comments11 min readLW link

In­ter­pretabil­ity in Ac­tion: Ex­plo­ra­tory Anal­y­sis of VPT, a Minecraft Agent

18 Jul 2024 17:02 UTC
9 points
0 comments1 min readLW link
(arxiv.org)

Ac­ti­va­tion Eng­ineer­ing The­o­ries of Impact

kubanetics18 Jul 2024 16:44 UTC
6 points
1 comment2 min readLW link