RSS

Senthooran Rajamanoharan

Karma: 578

In­terim Re­search Re­port: Mechanisms of Awareness

May 2, 2025, 8:29 PM
43 points
6 comments8 min readLW link

Nega­tive Re­sults for SAEs On Down­stream Tasks and Depri­ori­tis­ing SAE Re­search (GDM Mech In­terp Team Progress Up­date #2)

Mar 26, 2025, 7:07 PM
113 points
15 comments29 min readLW link
(deepmindsafetyresearch.medium.com)

Take­aways From Our Re­cent Work on SAE Probing

Mar 3, 2025, 7:50 PM
30 points
0 comments5 min readLW link

SAE Prob­ing: What is it good for?

Nov 1, 2024, 7:23 PM
33 points
0 comments11 min readLW link

JumpReLU SAEs + Early Ac­cess to Gemma 2 SAEs

Jul 19, 2024, 4:10 PM
49 points
10 comments1 min readLW link
(storage.googleapis.com)

Im­prov­ing Dic­tionary Learn­ing with Gated Sparse Autoencoders

Apr 25, 2024, 6:43 PM
63 points
38 comments1 min readLW link
(arxiv.org)

[Full Post] Progress Up­date #1 from the GDM Mech In­terp Team

Apr 19, 2024, 7:06 PM
79 points
10 comments8 min readLW link

[Sum­mary] Progress Up­date #1 from the GDM Mech In­terp Team

Apr 19, 2024, 7:06 PM
73 points
0 comments3 min readLW link

Case Stud­ies in Re­v­erse-Eng­ineer­ing Sparse Au­toen­coder Fea­tures by Us­ing MLP Linearization

Jan 14, 2024, 2:06 AM
24 points
0 comments42 min readLW link

Fact Find­ing: Do Early Lay­ers Spe­cial­ise in Lo­cal Pro­cess­ing? (Post 5)

Dec 23, 2023, 2:46 AM
18 points
0 comments4 min readLW link

Fact Find­ing: How to Think About In­ter­pret­ing Me­mori­sa­tion (Post 4)

Dec 23, 2023, 2:46 AM
22 points
0 comments9 min readLW link

Fact Find­ing: Try­ing to Mechanis­ti­cally Un­der­stand­ing Early MLPs (Post 3)

23 Dec 2023 2:46 UTC
10 points
1 comment16 min readLW link

Fact Find­ing: Sim­plify­ing the Cir­cuit (Post 2)

23 Dec 2023 2:45 UTC
25 points
3 comments14 min readLW link

Fact Find­ing: At­tempt­ing to Re­v­erse-Eng­ineer Fac­tual Re­call on the Neu­ron Level (Post 1)

23 Dec 2023 2:44 UTC
106 points
10 comments22 min readLW link2 reviews