RSS

rajashree

Karma: 95

Mea­sur­ing Non­lin­ear Fea­ture In­ter­ac­tions in Sparse Cross­coders [Pro­ject Pro­posal]

6 Jan 2025 4:22 UTC
19 points
0 comments12 min readLW link

Com­pact Proofs of Model Perfor­mance via Mechanis­tic Interpretability

24 Jun 2024 19:27 UTC
96 points
4 comments8 min readLW link
(arxiv.org)