rajashree

Karma: 106

[Replication] Crosscoder-based Stage-Wise Model Diffing

Anna Soligo, Thomas Read, Oliver Clive-Griffin, dmanningcoe, Chun Hei Yip, rajashree and Jason Gross

22 Mar 2025 18:35 UTC

21 points

0 comments7 min readLW link

Measuring Nonlinear Feature Interactions in Sparse Crosscoders [Project Proposal]

Jason Gross and rajashree

6 Jan 2025 4:22 UTC

19 points

0 comments12 min readLW link

Compact Proofs of Model Performance via Mechanistic Interpretability

LawrenceC, rajashree, Adrià Garriga-alonso and Jason Gross

24 Jun 2024 19:27 UTC

96 points

4 comments8 min readLW link

(arxiv.org)