rajashree

Karma: 79

Compact Proofs of Model Performance via Mechanistic Interpretability

LawrenceC, rajashree, Adrià Garriga-alonso and Jason Gross

24 Jun 2024 19:27 UTC

94 points

3 comments8 min readLW link

(arxiv.org)