RSS

Neel Nanda

Karma: 8,997

Learn­ing Multi-Level Fea­tures with Ma­tryoshka SAEs

19 Dec 2024 15:59 UTC
26 points
1 comment11 min readLW link

SAEBench: A Com­pre­hen­sive Bench­mark for Sparse Autoencoders

11 Dec 2024 6:30 UTC
71 points
1 comment2 min readLW link
(www.neuronpedia.org)

Evolu­tion­ary prompt op­ti­miza­tion for SAE fea­ture visualization

14 Nov 2024 13:06 UTC
16 points
0 comments9 min readLW link

SAEs are highly dataset de­pen­dent: a case study on the re­fusal direction

7 Nov 2024 5:22 UTC
63 points
4 comments14 min readLW link

SAE Prob­ing: What is it good for? Ab­solutely some­thing!

1 Nov 2024 19:23 UTC
31 points
0 comments11 min readLW link

Open Source Repli­ca­tion of An­thropic’s Cross­coder pa­per for model-diffing

27 Oct 2024 18:46 UTC
39 points
4 comments5 min readLW link