RSS

Jason Gross

Karma: 258

[Repli­ca­tion] Cross­coder-based Stage-Wise Model Diffing

Mar 22, 2025, 6:35 PM
21 points
0 comments7 min readLW link

Mea­sur­ing Non­lin­ear Fea­ture In­ter­ac­tions in Sparse Cross­coders [Pro­ject Pro­posal]

Jan 6, 2025, 4:22 AM
19 points
0 comments12 min readLW link

Com­pact Proofs of Model Perfor­mance via Mechanis­tic Interpretability

Jun 24, 2024, 7:27 PM
96 points
4 comments8 min readLW link
(arxiv.org)