Alex Makelov

Karma: 71

Alex Makelov Jun 16, 2024, 2:05 PM
1 point
0
in reply to: Jaehyuk Lim’s comment on: SAEs Discover Meaningful Features in the IOI Task
Hi—there’s code here https://github.com/amakelov/sae which covers almost everything reported in the blog post. Let me know if you have more specific questions (or open an issue) and I can point to / explain specific parts of the code!

SAEs Discover Meaningful Features in the IOI Task

Alex Makelov, Georg Lange and Neel Nanda

Jun 5, 2024, 11:48 PM

15 points

2 comments10 min readLW link

An Interpretability Illusion for Activation Patching of Arbitrary Subspaces

Georg Lange, Alex Makelov and Neel Nanda

Aug 29, 2023, 1:04 AM

77 points

4 comments1 min readLW link