RSS

jake_mendel

Karma: 883

Interpretability Researcher at Apollo Research

Cir­cuits in Su­per­po­si­tion: Com­press­ing many small neu­ral net­works into one

14 Oct 2024 13:06 UTC
124 points
7 comments13 min readLW link

jake_mendel’s Shortform

jake_mendel19 Sep 2024 10:37 UTC
5 points
3 comments1 min readLW link

[In­terim re­search re­port] Ac­ti­va­tion plateaus & sen­si­tive di­rec­tions in GPT2

5 Jul 2024 17:05 UTC
64 points
2 comments5 min readLW link

SAE fea­ture ge­om­e­try is out­side the su­per­po­si­tion hypothesis

jake_mendel24 Jun 2024 16:07 UTC
221 points
17 comments11 min readLW link

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
93 points
0 comments7 min readLW link

In­ter­pretabil­ity: In­te­grated Gra­di­ents is a de­cent at­tri­bu­tion method

20 May 2024 17:55 UTC
22 points
7 comments6 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

20 May 2024 17:53 UTC
105 points
4 comments3 min readLW link

A start­ing point for mak­ing sense of task struc­ture (in ma­chine learn­ing)

24 Feb 2024 1:51 UTC
45 points
2 comments12 min readLW link

Toward A Math­e­mat­i­cal Frame­work for Com­pu­ta­tion in Superposition

18 Jan 2024 21:06 UTC
203 points
18 comments63 min readLW link