RSS

rajashree

Karma: 80

Com­pact Proofs of Model Perfor­mance via Mechanis­tic Interpretability

24 Jun 2024 19:27 UTC
95 points
3 comments8 min readLW link
(arxiv.org)