RSS

Joseph Bloom

Karma: 1,052

Toy Models of Fea­ture Ab­sorp­tion in SAEs

7 Oct 2024 9:56 UTC
46 points
8 comments10 min readLW link

[Paper] A is for Ab­sorp­tion: Study­ing Fea­ture Split­ting and Ab­sorp­tion in Sparse Autoencoders

25 Sep 2024 9:31 UTC
69 points
15 comments3 min readLW link
(arxiv.org)

Show­ing SAE La­tents Are Not Atomic Us­ing Meta-SAEs

24 Aug 2024 0:56 UTC
60 points
9 comments20 min readLW link

Stitch­ing SAEs of differ­ent sizes

13 Jul 2024 17:19 UTC
39 points
12 comments12 min readLW link

A Selec­tion of Ran­domly Selected SAE Features

1 Apr 2024 9:09 UTC
109 points
2 comments4 min readLW link

SAE-VIS: An­nounce­ment Post

31 Mar 2024 15:30 UTC
74 points
8 comments1 min readLW link

An­nounc­ing Neu­ron­pe­dia: Plat­form for ac­cel­er­at­ing re­search into Sparse Autoencoders

25 Mar 2024 21:17 UTC
91 points
7 comments7 min readLW link

Un­der­stand­ing SAE Fea­tures with the Logit Lens

11 Mar 2024 0:16 UTC
59 points
0 comments14 min readLW link

Ex­am­in­ing Lan­guage Model Perfor­mance with Re­con­structed Ac­ti­va­tions us­ing Sparse Au­toen­coders

27 Feb 2024 2:43 UTC
42 points
16 comments15 min readLW link

Open Source Sparse Au­toen­coders for all Resi­d­ual Stream Lay­ers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC
100 points
37 comments15 min readLW link

Lin­ear en­cod­ing of char­ac­ter-level in­for­ma­tion in GPT-J to­ken embeddings

10 Nov 2023 22:19 UTC
34 points
4 comments28 min readLW link

Fea­tures and Ad­ver­saries in MemoryDT

20 Oct 2023 7:32 UTC
31 points
6 comments25 min readLW link

Joseph Bloom on choos­ing AI Align­ment over bio, what many as­piring re­searchers get wrong, and more (in­ter­view)

17 Sep 2023 18:45 UTC
27 points
2 comments8 min readLW link

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of a GridWorld Agent-Si­mu­la­tor (Part 1 of N)

Joseph Bloom16 May 2023 22:59 UTC
36 points
2 comments16 min readLW link

De­ci­sion Trans­former Interpretability

6 Feb 2023 7:29 UTC
84 points
13 comments24 min readLW link