RSS

rajashree

Karma: 106

[Repli­ca­tion] Cross­coder-based Stage-Wise Model Diffing

22 Mar 2025 18:35 UTC
21 points
0 comments7 min readLW link

Mea­sur­ing Non­lin­ear Fea­ture In­ter­ac­tions in Sparse Cross­coders [Pro­ject Pro­posal]

6 Jan 2025 4:22 UTC
19 points
0 comments12 min readLW link

Com­pact Proofs of Model Perfor­mance via Mechanis­tic Interpretability

24 Jun 2024 19:27 UTC
96 points
4 comments8 min readLW link
(arxiv.org)