RSS

Superposition

TagLast edit: Dec 5, 2023, 8:41 PM by duck_master

Posts about the concept of superposition—that is, neural nets representing concepts as a superposition of many neurons.

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

Dec 13, 2022, 3:41 PM
150 points
23 comments22 min readLW link2 reviews

Some costs of superposition

Linda LinseforsMar 3, 2024, 4:08 PM
46 points
11 comments3 min readLW link

Toward A Math­e­mat­i­cal Frame­work for Com­pu­ta­tion in Superposition

Jan 18, 2024, 9:06 PM
204 points
18 comments63 min readLW link

Con­di­tional Im­por­tance in Toy Models of Superposition

james__pFeb 2, 2025, 8:35 PM
9 points
4 comments10 min readLW link

Thoughts on Toy Models of Superposition

james__pFeb 2, 2025, 1:52 PM
5 points
2 comments9 min readLW link

AI al­ign­ment as a trans­la­tion problem

Roman LeventovFeb 5, 2024, 2:14 PM
22 points
2 comments3 min readLW link

From Con­cep­tual Spaces to Quan­tum Con­cepts: For­mal­is­ing and Learn­ing Struc­tured Con­cep­tual Models

Roman LeventovFeb 6, 2024, 10:18 AM
8 points
1 comment4 min readLW link
(arxiv.org)

Cir­cuits in Su­per­po­si­tion: Com­press­ing many small neu­ral net­works into one

Oct 14, 2024, 1:06 PM
130 points
9 comments13 min readLW link

Su­per­po­si­tion is not “just” neu­ron polysemanticity

LawrenceCApr 26, 2024, 11:22 PM
66 points
4 comments13 min readLW link

Limi­ta­tions on the In­ter­pretabil­ity of Learned Fea­tures from Sparse Dic­tionary Learning

Tom AngstenJul 30, 2024, 4:36 PM
6 points
0 comments9 min readLW link

Toy Models of Su­per­po­si­tion: Sim­plified by Hand

Axel SorensenSep 29, 2024, 9:19 PM
9 points
3 comments8 min readLW link

Effects of Non-Uniform Spar­sity on Su­per­po­si­tion in Toy Models

Shreyans JainNov 14, 2024, 4:59 PM
4 points
3 comments6 min readLW link

Com­pu­ta­tional Su­per­po­si­tion in a Toy Model of the U-AND Problem

Adam NewgasMar 27, 2025, 4:56 PM
17 points
2 comments11 min readLW link

Toy Models of Superposition

evhubSep 21, 2022, 11:48 PM
69 points
4 comments5 min readLW link1 review
(transformer-circuits.pub)

Growth and Form in a Toy Model of Superposition

Nov 8, 2023, 11:08 AM
89 points
7 comments14 min readLW link

Paper: Su­per­po­si­tion, Me­moriza­tion, and Dou­ble Des­cent (An­thropic)

LawrenceCJan 5, 2023, 5:54 PM
53 points
11 comments1 min readLW link
(transformer-circuits.pub)

An OV-Co­her­ent Toy Model of At­ten­tion Head Superposition

Aug 29, 2023, 7:44 PM
26 points
2 comments6 min readLW link

Ex­pand­ing the Scope of Superposition

Derek LarsonSep 13, 2023, 5:38 PM
10 points
0 comments4 min readLW link

Tak­ing fea­tures out of su­per­po­si­tion with sparse au­toen­coders more quickly with in­formed initialization

Pierre PeignéSep 23, 2023, 4:21 PM
30 points
8 comments5 min readLW link

200 COP in MI: Ex­plor­ing Poly­se­man­tic­ity and Superposition

Neel NandaJan 3, 2023, 1:52 AM
34 points
6 comments16 min readLW link

Su­per­po­si­tion and Dropout

Edoardo PonaMay 16, 2023, 7:24 AM
21 points
5 comments6 min readLW link

In­ter­pretabil­ity with Sparse Au­toen­coders (Co­lab ex­er­cises)

CallumMcDougallNov 29, 2023, 12:56 PM
76 points
9 comments4 min readLW link

Com­par­ing An­thropic’s Dic­tionary Learn­ing to Ours

Robert_AIZIOct 7, 2023, 11:30 PM
137 points
8 comments4 min readLW link

Some open-source dic­tio­nar­ies and dic­tio­nary learn­ing infrastructure

Sam MarksDec 5, 2023, 6:05 AM
46 points
7 comments5 min readLW link

Sparse MLP Distillation

slavachalnevJan 15, 2024, 7:39 PM
30 points
3 comments6 min readLW link

Towards Monose­man­tic­ity: De­com­pos­ing Lan­guage Models With Dic­tionary Learning

Zac Hatfield-DoddsOct 5, 2023, 9:01 PM
288 points
22 comments2 min readLW link1 review
(transformer-circuits.pub)

Craft­ing Poly­se­man­tic Trans­former Bench­marks with Known Circuits

Aug 23, 2024, 10:03 PM
10 points
0 comments25 min readLW link

Open Source Sparse Au­toen­coders for all Resi­d­ual Stream Lay­ers of GPT2-Small

Joseph BloomFeb 2, 2024, 6:54 AM
103 points
37 comments15 min readLW link

Sparse au­toen­coders find com­posed fea­tures in small toy mod­els

Mar 14, 2024, 6:00 PM
33 points
12 comments15 min readLW link

Scal­ing Laws and Superposition

Pavan KattaApr 10, 2024, 3:36 PM
9 points
4 comments5 min readLW link
(www.pavankatta.com)

Su­per­po­si­tion through Ac­tive Learn­ing Lens

akankshancSep 17, 2024, 5:32 PM
1 point
0 comments10 min readLW link
No comments.