200 Concrete Open Problems in Mechanistic InterpretabilityNeel NandaDec 28, 2022, 12:45 AMConcrete Steps to Get Started in Transformer Mechanistic InterpretabilityNeel NandaDec 25, 2022, 10:21 PM57 points7 comments12 min readLW link(www.neelnanda.io)200 Concrete Open Problems in Mechanistic Interpretability: IntroductionNeel NandaDec 28, 2022, 9:06 PM106 points0 comments10 min readLW link200 COP in MI: The Case for Analysing Toy Language ModelsNeel NandaDec 28, 2022, 9:07 PM40 points3 comments7 min readLW link200 COP in MI: Looking for Circuits in the WildNeel NandaDec 29, 2022, 8:59 PM16 points5 comments13 min readLW link200 COP in MI: Interpreting Algorithmic ProblemsNeel NandaDec 31, 2022, 7:55 PM33 points2 comments10 min readLW link200 COP in MI: Exploring Polysemanticity and SuperpositionNeel NandaJan 3, 2023, 1:52 AM34 points6 comments16 min readLW link200 COP in MI: Analysing Training DynamicsNeel NandaJan 4, 2023, 4:08 PM16 points0 comments14 min readLW link200 COP in MI: Techniques, Tooling and AutomationNeel NandaJan 6, 2023, 3:08 PM13 points0 comments15 min readLW link200 COP in MI: Image Model InterpretabilityNeel NandaJan 8, 2023, 2:53 PM18 points3 comments6 min readLW link200 COP in MI: Interpreting Reinforcement LearningNeel NandaJan 10, 2023, 5:37 PM25 points1 comment10 min readLW link200 COP in MI: Studying Learned Features in Language ModelsNeel NandaJan 19, 2023, 3:48 AM24 points2 comments30 min readLW link