In­ter­pret­ing Prefer­ence Models w/​ Sparse Autoencoders

1 Jul 2024 21:35 UTC
74 points
12 comments9 min readLW link

Hon­est sci­ence is spirituality

pchvykov1 Jul 2024 20:33 UTC
−1 points
10 comments4 min readLW link

New Ex­ec­u­tive Team & Board — PIBBSS

Nora_Ammann1 Jul 2024 19:30 UTC
43 points
1 comment1 min readLW link

Un­curs­ing Civilization

Lorec1 Jul 2024 18:44 UTC
−6 points
2 comments5 min readLW link

[Question] Self-cen­sor­ing on AI x-risk dis­cus­sions?

Decaeneus1 Jul 2024 18:24 UTC
17 points
2 comments1 min readLW link

Ra­tion­al­ists As Peo­ple Who Build Piles Of Rocks

Sable1 Jul 2024 10:32 UTC
9 points
0 comments5 min readLW link
(affablyevil.substack.com)

How good are LLMs at do­ing ML on an un­known dataset?

Håvard Tveit Ihle1 Jul 2024 9:04 UTC
33 points
4 comments13 min readLW link

Whirlwind Tour of Chain of Thought Liter­a­ture Rele­vant to Au­tomat­ing Align­ment Re­search.

sevdeawesome1 Jul 2024 5:50 UTC
23 points
0 comments17 min readLW link

Prob­a­bil­is­tic Logic ⇔ Or­a­cles?

Yudhister Kumar1 Jul 2024 5:36 UTC
15 points
0 comments4 min readLW link

Im­por­tant open prob­lems in voting

Closed Limelike Curves1 Jul 2024 2:53 UTC
33 points
1 comment1 min readLW link

Anti-Cir­cum­ci­sion Es­say 3 of 3: Now That I Think About It, Is There Ac­tu­ally a Space Between “Info” and “Hazard”? Isn’t It Just One Word?

Harry Stevenage1 Jul 2024 2:21 UTC
12 points
0 comments7 min readLW link

In Defense of Lawyers Play­ing Their Part

Isaac King1 Jul 2024 1:32 UTC
32 points
9 comments9 min readLW link

Anti-cir­cum­ci­sion Es­say 2 of 3: Phys­i­cal and Psy­cholog­i­cal Realities

Harry Stevenage30 Jun 2024 22:13 UTC
12 points
5 comments9 min readLW link

Re­view of METR’s pub­lic eval­u­a­tion protocol

30 Jun 2024 22:03 UTC
10 points
0 comments5 min readLW link

Su­per­po­si­tion, Self-Model­ing, and the Path to AGI: A New Perspective

Peterpiper30 Jun 2024 17:20 UTC
−13 points
0 comments2 min readLW link

Anti-Cir­cum­ci­sion Es­say 1 of 3: Ac­cord­ing To Their Crit­ics, In­tac­tivists Are The Best-Be­haved Protest Move­ment In His­tory

Harry Stevenage30 Jun 2024 17:17 UTC
12 points
6 comments5 min readLW link

The Xerox Parc/​ARPA ver­sion of the in­tel­lec­tual Tur­ing test: Class 1 vs Class 2 disagreement

hamishtodd130 Jun 2024 15:34 UTC
6 points
3 comments1 min readLW link

LLMs Univer­sally Learn a Fea­ture Rep­re­sent­ing To­ken Fre­quency /​ Rarity

Sean Osier30 Jun 2024 2:48 UTC
12 points
5 comments6 min readLW link
(github.com)

My 5-step pro­gram for los­ing weight

Nikita Sokolsky30 Jun 2024 1:05 UTC
22 points
20 comments5 min readLW link
(nsokolsky.substack.com)

Datasets that change the odds you exist

dynomight29 Jun 2024 18:45 UTC
55 points
4 comments6 min readLW link
(dynomight.net)

A “Scal­ing Monose­man­tic­ity” Explainer

29 Jun 2024 17:50 UTC
10 points
0 comments3 min readLW link

Anal­y­sis of key AI analogies

Kevin Kohler29 Jun 2024 10:55 UTC
10 points
2 comments15 min readLW link

Ge­or­gism Crash Course

Zero Contradictions29 Jun 2024 6:18 UTC
9 points
5 comments1 min readLW link
(zerocontradictions.net)

Ac­ti­va­tion Pat­tern SVD: A pro­posal for SAE Interpretability

Daniel Tan28 Jun 2024 22:12 UTC
15 points
2 comments2 min readLW link

Pod­cast: Eliz­a­beth & Austin on “What Man­i­fold was al­lowed to do”

Austin Chen28 Jun 2024 22:10 UTC
20 points
0 comments1 min readLW link
(share.descript.com)

The In­cred­ible Fen­tanyl-De­tect­ing Machine

sarahconstantin28 Jun 2024 22:10 UTC
154 points
26 comments7 min readLW link
(sarahconstantin.substack.com)

Sav­ing Lives Re­duces Over-Pop­u­la­tion—A Counter-In­tu­itive Non-Zero-Sum Game

James Stephen Brown28 Jun 2024 19:29 UTC
6 points
0 comments5 min readLW link
(nonzerosum.games)

Men­tor­ship in AGI Safety: Ap­pli­ca­tions for men­tor­ship are open!

28 Jun 2024 14:49 UTC
5 points
0 comments1 min readLW link

Con­tra Ace­moglu on AI

Maxwell Tabarrok28 Jun 2024 13:13 UTC
48 points
0 comments5 min readLW link
(www.maximum-progress.com)

Five toy wor­lds to think about her­i­ta­bil­ity

David Hugh-Jones28 Jun 2024 13:11 UTC
13 points
0 comments9 min readLW link
(wyclif.substack.com)

[Question] How do nat­u­ral sci­ences prove cau­sa­tion?

Kongo Landwalker28 Jun 2024 11:58 UTC
1 point
3 comments1 min readLW link

LessWrong/​ACX meetup Tran­sil­vanya tour—Sibiu

Marius Adrian Nicoară28 Jun 2024 11:41 UTC
1 point
1 comment1 min readLW link

Bayes’ The­o­rem: In Search of Gold (Les­son 1)

bayesyatina28 Jun 2024 8:39 UTC
3 points
0 comments3 min readLW link

How a chip is designed

YM28 Jun 2024 8:04 UTC
65 points
4 comments5 min readLW link

The Wis­dom of Liv­ing for 200 Years

Martin Sustrik28 Jun 2024 4:44 UTC
25 points
3 comments4 min readLW link

A Gen­er­ally In­tel­li­gent Game

snerx28 Jun 2024 1:31 UTC
−1 points
1 comment4 min readLW link

Cor­rigi­bil­ity = Tool-ness?

28 Jun 2024 1:19 UTC
78 points
8 comments9 min readLW link

Si­tu­a­tional Awareness

PeterMcCluskey28 Jun 2024 1:08 UTC
11 points
0 comments12 min readLW link
(bayesianinvestor.com)

Toward a tax­on­omy of cog­ni­tive bench­marks for agen­tic AGIs

Ben Smith27 Jun 2024 23:50 UTC
15 points
0 comments5 min readLW link

How Big a Deal are MatMul-Free Trans­form­ers?

JustisMills27 Jun 2024 22:28 UTC
19 points
6 comments5 min readLW link
(justismills.substack.com)

Se­condary forces of debt

KatjaGrace27 Jun 2024 21:10 UTC
77 points
18 comments2 min readLW link
(worldspiritsockpuppet.com)

Distil­la­tion of ‘Do lan­guage mod­els plan for fu­ture to­kens’

TheManxLoiner27 Jun 2024 20:57 UTC
26 points
2 comments6 min readLW link

how birds sense mag­netic fields

bhauth27 Jun 2024 18:59 UTC
51 points
4 comments5 min readLW link
(www.bhauth.com)

Rep­re­sen­ta­tion Tuning

Christopher Ackerman27 Jun 2024 17:44 UTC
35 points
9 comments13 min readLW link

An is­sue with train­ing schemers with su­per­vised fine-tuning

Fabien Roger27 Jun 2024 15:37 UTC
49 points
12 comments6 min readLW link

AI #70: A Beau­tiful Sonnet

Zvi27 Jun 2024 14:40 UTC
38 points
0 comments44 min readLW link
(thezvi.wordpress.com)

De­tect­ing Ge­net­i­cally Eng­ineered Viruses With Me­tage­nomic Sequencing

jefftk27 Jun 2024 14:01 UTC
87 points
10 comments1 min readLW link
(naobservatory.org)

Cross Robin

jefftk27 Jun 2024 3:10 UTC
11 points
2 comments1 min readLW link
(www.jefftk.com)

Live The­ory Part 0: Tak­ing In­tel­li­gence Seriously

Sahil26 Jun 2024 21:37 UTC
94 points
3 comments8 min readLW link

In­stru­men­tal vs Ter­mi­nal Desiderata

Max Harms26 Jun 2024 20:57 UTC
21 points
0 comments3 min readLW link