Ac­ti­va­tion Pat­tern SVD: A pro­posal for SAE Interpretability

Daniel Tan28 Jun 2024 22:12 UTC
15 points
2 comments2 min readLW link

Pod­cast: Eliz­a­beth & Austin on “What Man­i­fold was al­lowed to do”

Austin Chen28 Jun 2024 22:10 UTC
20 points
0 comments1 min readLW link
(share.descript.com)

The In­cred­ible Fen­tanyl-De­tect­ing Machine

sarahconstantin28 Jun 2024 22:10 UTC
154 points
26 comments7 min readLW link
(sarahconstantin.substack.com)

Sav­ing Lives Re­duces Over-Pop­u­la­tion—A Counter-In­tu­itive Non-Zero-Sum Game

James Stephen Brown28 Jun 2024 19:29 UTC
6 points
0 comments5 min readLW link
(nonzerosum.games)

Men­tor­ship in AGI Safety: Ap­pli­ca­tions for men­tor­ship are open!

28 Jun 2024 14:49 UTC
5 points
0 comments1 min readLW link

Con­tra Ace­moglu on AI

Maxwell Tabarrok28 Jun 2024 13:13 UTC
48 points
0 comments5 min readLW link
(www.maximum-progress.com)

Five toy wor­lds to think about her­i­ta­bil­ity

David Hugh-Jones28 Jun 2024 13:11 UTC
13 points
0 comments9 min readLW link
(wyclif.substack.com)

[Question] How do nat­u­ral sci­ences prove cau­sa­tion?

Kongo Landwalker28 Jun 2024 11:58 UTC
1 point
3 comments1 min readLW link

LessWrong/​ACX meetup Tran­sil­vanya tour—Sibiu

Marius Adrian Nicoară28 Jun 2024 11:41 UTC
1 point
1 comment1 min readLW link

Bayes’ The­o­rem: In Search of Gold (Les­son 1)

bayesyatina28 Jun 2024 8:39 UTC
3 points
0 comments3 min readLW link

How a chip is designed

YM28 Jun 2024 8:04 UTC
65 points
4 comments5 min readLW link

The Wis­dom of Liv­ing for 200 Years

Martin Sustrik28 Jun 2024 4:44 UTC
25 points
3 comments4 min readLW link

A Gen­er­ally In­tel­li­gent Game

snerx28 Jun 2024 1:31 UTC
−1 points
1 comment4 min readLW link

Cor­rigi­bil­ity = Tool-ness?

28 Jun 2024 1:19 UTC
78 points
8 comments9 min readLW link

Si­tu­a­tional Awareness

PeterMcCluskey28 Jun 2024 1:08 UTC
11 points
0 comments12 min readLW link
(bayesianinvestor.com)

Toward a tax­on­omy of cog­ni­tive bench­marks for agen­tic AGIs

Ben Smith27 Jun 2024 23:50 UTC
15 points
0 comments5 min readLW link

How Big a Deal are MatMul-Free Trans­form­ers?

JustisMills27 Jun 2024 22:28 UTC
19 points
6 comments5 min readLW link
(justismills.substack.com)

Se­condary forces of debt

KatjaGrace27 Jun 2024 21:10 UTC
77 points
18 comments2 min readLW link
(worldspiritsockpuppet.com)

Distil­la­tion of ‘Do lan­guage mod­els plan for fu­ture to­kens’

TheManxLoiner27 Jun 2024 20:57 UTC
26 points
2 comments6 min readLW link

how birds sense mag­netic fields

bhauth27 Jun 2024 18:59 UTC
51 points
4 comments5 min readLW link
(www.bhauth.com)

Rep­re­sen­ta­tion Tuning

Christopher Ackerman27 Jun 2024 17:44 UTC
35 points
9 comments13 min readLW link

An is­sue with train­ing schemers with su­per­vised fine-tuning

Fabien Roger27 Jun 2024 15:37 UTC
49 points
12 comments6 min readLW link

AI #70: A Beau­tiful Sonnet

Zvi27 Jun 2024 14:40 UTC
38 points
0 comments44 min readLW link
(thezvi.wordpress.com)

De­tect­ing Ge­net­i­cally Eng­ineered Viruses With Me­tage­nomic Sequencing

jefftk27 Jun 2024 14:01 UTC
87 points
10 comments1 min readLW link
(naobservatory.org)

Cross Robin

jefftk27 Jun 2024 3:10 UTC
11 points
2 comments1 min readLW link
(www.jefftk.com)

Live The­ory Part 0: Tak­ing In­tel­li­gence Seriously

Sahil26 Jun 2024 21:37 UTC
94 points
3 comments8 min readLW link

In­stru­men­tal vs Ter­mi­nal Desiderata

Max Harms26 Jun 2024 20:57 UTC
21 points
0 comments3 min readLW link

Im­bue (Gen­er­ally In­tel­li­gent) con­tinue to make progress

Nathan Helm-Burger26 Jun 2024 20:41 UTC
18 points
0 comments1 min readLW link
(imbue.com)

Trac­ing the steps

matimissona26 Jun 2024 19:22 UTC
−8 points
2 comments4 min readLW link

Coun­ter­ing AI dis­in­for­ma­tion and deep fakes with digi­tal signatures

Dave Lindbergh26 Jun 2024 18:09 UTC
13 points
5 comments1 min readLW link

Progress Con­fer­ence 2024: Toward Abun­dant Futures

jasoncrawford26 Jun 2024 15:39 UTC
40 points
2 comments1 min readLW link
(rootsofprogress.org)

Schel­ling points in the AGI policy space

mesaoptimizer26 Jun 2024 13:19 UTC
52 points
2 comments6 min readLW link

Bad les­sons learned from the debate

bayesyatina26 Jun 2024 11:54 UTC
8 points
5 comments6 min readLW link

Child­hood and Ed­u­ca­tion Roundup #6: Col­lege Edition

Zvi26 Jun 2024 11:40 UTC
28 points
8 comments23 min readLW link
(thezvi.wordpress.com)

New fast trans­former in­fer­ence ASIC — Sohu by Etched

lemonhope26 Jun 2024 9:56 UTC
8 points
9 comments1 min readLW link
(www.etched.com)

Em­piri­cal vs. Math­e­mat­i­cal Joints of Nature

26 Jun 2024 1:55 UTC
35 points
1 comment5 min readLW link

My Cur­rent Claims and Cruxes on LLM Fore­cast­ing & Epistemics

ozziegooen26 Jun 2024 0:40 UTC
11 points
0 comments1 min readLW link

In favour of ex­plor­ing nag­ging doubts about x-risk

owencb25 Jun 2024 23:52 UTC
105 points
2 comments1 min readLW link

What is a Tool?

25 Jun 2024 23:40 UTC
62 points
4 comments6 min readLW link

[Question] When do al­ign­ment re­searchers re­tire?

Jordan Taylor25 Jun 2024 23:30 UTC
4 points
2 comments1 min readLW link

Com­pute Gover­nance Liter­a­ture Re­view

sijarvis25 Jun 2024 22:41 UTC
10 points
0 comments13 min readLW link

Com­pu­ta­tional Com­plex­ity as an In­tu­ition Pump for LLM Gen­er­al­ity

aribrill25 Jun 2024 20:25 UTC
18 points
6 comments3 min readLW link

Failure Modes of Teach­ing AI Safety

Eleni Angelou25 Jun 2024 19:07 UTC
20 points
0 comments1 min readLW link

Kingfisher Sum­mer Tour 2024

jefftk25 Jun 2024 18:50 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

In­cen­tive Learn­ing vs Dead Sea Salt Experiment

Steven Byrnes25 Jun 2024 17:49 UTC
27 points
1 comment28 min readLW link

An In­tu­itive Ex­pla­na­tion of Sparse Au­toen­coders for Mechanis­tic In­ter­pretabil­ity of LLMs

Adam Karvonen25 Jun 2024 15:57 UTC
25 points
0 comments9 min readLW link
(adamkarvonen.github.io)

For­mal ver­ifi­ca­tion, heuris­tic ex­pla­na­tions and sur­prise accounting

Jacob_Hilton25 Jun 2024 15:40 UTC
156 points
11 comments9 min readLW link
(www.alignment.org)

Me­tas­trat­egy get-started guide

Tahp25 Jun 2024 15:04 UTC
5 points
1 comment8 min readLW link

La­bor Par­ti­ci­pa­tion is an Align­ment Risk

alex25 Jun 2024 14:15 UTC
−5 points
2 comments17 min readLW link

Monthly Roundup #19: June 2024

Zvi25 Jun 2024 12:00 UTC
28 points
9 comments54 min readLW link
(thezvi.wordpress.com)