RSS

rajashree

Karma: 79

Com­pact Proofs of Model Perfor­mance via Mechanis­tic Interpretability

24 Jun 2024 19:27 UTC
94 points
3 comments8 min readLW link
(arxiv.org)