RSS

wesg(Wes Gurnee)

Karma: 384

OR PhD student at MIT working on interpretability.

Find out more here: https://​​wesg.me/​​

SAE re­con­struc­tion er­rors are (em­piri­cally) pathological

wesg29 Mar 2024 16:37 UTC
88 points
15 comments8 min readLW link

Find­ing Neu­rons in a Haystack: Case Stud­ies with Sparse Probing

3 May 2023 13:30 UTC
33 points
5 comments2 min readLW link
(arxiv.org)