RSS

Superposition

TagLast edit: 5 Dec 2023 20:41 UTC by duck_master

Posts about the concept of superposition—that is, neural nets representing concepts as a superposition of many neurons.

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

13 Dec 2022 15:41 UTC
149 points
23 comments22 min readLW link2 reviews

Some costs of superposition

Linda Linsefors3 Mar 2024 16:08 UTC
46 points
11 comments3 min readLW link

Cir­cuits in Su­per­po­si­tion: Com­press­ing many small neu­ral net­works into one

14 Oct 2024 13:06 UTC
124 points
7 comments13 min readLW link

AI al­ign­ment as a trans­la­tion problem

Roman Leventov5 Feb 2024 14:14 UTC
22 points
2 comments3 min readLW link

From Con­cep­tual Spaces to Quan­tum Con­cepts: For­mal­is­ing and Learn­ing Struc­tured Con­cep­tual Models

Roman Leventov6 Feb 2024 10:18 UTC
8 points
1 comment4 min readLW link
(arxiv.org)

Su­per­po­si­tion is not “just” neu­ron polysemanticity

LawrenceC26 Apr 2024 23:22 UTC
64 points
4 comments13 min readLW link

Toward A Math­e­mat­i­cal Frame­work for Com­pu­ta­tion in Superposition

18 Jan 2024 21:06 UTC
195 points
18 comments63 min readLW link

Su­per­po­si­tion through Ac­tive Learn­ing Lens

akankshanc17 Sep 2024 17:32 UTC
1 point
0 comments10 min readLW link

Limi­ta­tions on the In­ter­pretabil­ity of Learned Fea­tures from Sparse Dic­tionary Learning

Tom Angsten30 Jul 2024 16:36 UTC
6 points
0 comments9 min readLW link

Toy Models of Su­per­po­si­tion: Sim­plified by Hand

Axel Sorensen29 Sep 2024 21:19 UTC
9 points
3 comments8 min readLW link

Toy Models of Superposition

evhub21 Sep 2022 23:48 UTC
69 points
4 comments5 min readLW link1 review
(transformer-circuits.pub)

Growth and Form in a Toy Model of Superposition

8 Nov 2023 11:08 UTC
87 points
7 comments14 min readLW link

Paper: Su­per­po­si­tion, Me­moriza­tion, and Dou­ble Des­cent (An­thropic)

LawrenceC5 Jan 2023 17:54 UTC
53 points
11 comments1 min readLW link
(transformer-circuits.pub)

An OV-Co­her­ent Toy Model of At­ten­tion Head Superposition

29 Aug 2023 19:44 UTC
26 points
2 comments6 min readLW link

Ex­pand­ing the Scope of Superposition

Derek Larson13 Sep 2023 17:38 UTC
10 points
0 comments4 min readLW link

Tak­ing fea­tures out of su­per­po­si­tion with sparse au­toen­coders more quickly with in­formed initialization

Pierre Peigné23 Sep 2023 16:21 UTC
30 points
8 comments5 min readLW link

200 COP in MI: Ex­plor­ing Poly­se­man­tic­ity and Superposition

Neel Nanda3 Jan 2023 1:52 UTC
34 points
6 comments16 min readLW link

Su­per­po­si­tion and Dropout

Edoardo Pona16 May 2023 7:24 UTC
21 points
5 comments6 min readLW link

In­ter­pretabil­ity with Sparse Au­toen­coders (Co­lab ex­er­cises)

CallumMcDougall29 Nov 2023 12:56 UTC
74 points
9 comments4 min readLW link

Com­par­ing An­thropic’s Dic­tionary Learn­ing to Ours

Robert_AIZI7 Oct 2023 23:30 UTC
137 points
8 comments4 min readLW link

Some open-source dic­tio­nar­ies and dic­tio­nary learn­ing infrastructure

Sam Marks5 Dec 2023 6:05 UTC
45 points
7 comments5 min readLW link

Sparse MLP Distillation

slavachalnev15 Jan 2024 19:39 UTC
30 points
3 comments6 min readLW link

Towards Monose­man­tic­ity: De­com­pos­ing Lan­guage Models With Dic­tionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC
287 points
21 comments2 min readLW link
(transformer-circuits.pub)

Craft­ing Poly­se­man­tic Trans­former Bench­marks with Known Circuits

23 Aug 2024 22:03 UTC
10 points
0 comments25 min readLW link

Open Source Sparse Au­toen­coders for all Resi­d­ual Stream Lay­ers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC
100 points
37 comments15 min readLW link

Sparse au­toen­coders find com­posed fea­tures in small toy mod­els

14 Mar 2024 18:00 UTC
33 points
12 comments15 min readLW link

Scal­ing Laws and Superposition

Pavan Katta10 Apr 2024 15:36 UTC
9 points
4 comments5 min readLW link
(www.pavankatta.com)
No comments.