Senthooran Rajamanoharan

Karma: 578

Interim Research Report: Mechanisms of Awareness

Josh Engels, Neel Nanda and Senthooran Rajamanoharan

May 2, 2025, 8:29 PM

43 points

6 comments8 min readLW link

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

lewis smith, Senthooran Rajamanoharan, Arthur Conmy, CallumMcDougall, Tom Lieberum, János Kramár, Rohin Shah and Neel Nanda

Mar 26, 2025, 7:07 PM

113 points

15 comments29 min readLW link

(deepmindsafetyresearch.medium.com)

Takeaways From Our Recent Work on SAE Probing

Josh Engels, Subhash Kantamneni, Senthooran Rajamanoharan and Neel Nanda

Mar 3, 2025, 7:50 PM

30 points

0 comments5 min readLW link

SAE Probing: What is it good for?

Subhash Kantamneni, Josh Engels, Senthooran Rajamanoharan and Neel Nanda

Nov 1, 2024, 7:23 PM

33 points

0 comments11 min readLW link

JumpReLU SAEs + Early Access to Gemma 2 SAEs

Senthooran Rajamanoharan, Tom Lieberum, nps29, Arthur Conmy, Vikrant Varma, János Kramár and Neel Nanda

Jul 19, 2024, 4:10 PM

49 points

10 comments1 min readLW link

(storage.googleapis.com)

Improving Dictionary Learning with Gated Sparse Autoencoders

Senthooran Rajamanoharan, Arthur Conmy, lewis smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah and Neel Nanda

Apr 25, 2024, 6:43 PM

63 points

38 comments1 min readLW link

(arxiv.org)

[Full Post] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

Apr 19, 2024, 7:06 PM

79 points

10 comments8 min readLW link

[Summary] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

Apr 19, 2024, 7:06 PM

73 points

0 comments3 min readLW link

Case Studies in Reverse-Engineering Sparse Autoencoder Features by Using MLP Linearization

Jacob Dunefsky, Philippe Chlenski, Senthooran Rajamanoharan and Neel Nanda

Jan 14, 2024, 2:06 AM

24 points

0 comments42 min readLW link

Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

Dec 23, 2023, 2:46 AM

18 points

0 comments4 min readLW link

Fact Finding: How to Think About Interpreting Memorisation (Post 4)

Senthooran Rajamanoharan, Neel Nanda, János Kramár and Rohin Shah

Dec 23, 2023, 2:46 AM

22 points

0 comments9 min readLW link

Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

23 Dec 2023 2:46 UTC

10 points

1 comment16 min readLW link

Fact Finding: Simplifying the Circuit (Post 2)

Senthooran Rajamanoharan, Neel Nanda, János Kramár and Rohin Shah

23 Dec 2023 2:45 UTC

25 points

3 comments14 min readLW link

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

23 Dec 2023 2:44 UTC

106 points

10 comments22 min readLW link 2 reviews