200 Concrete Open Problems in Mechanistic InterpretabilityNeel Nanda28 Dec 2022 0:45 UTCConcrete Steps to Get Started in Transformer Mechanistic InterpretabilityNeel Nanda25 Dec 2022 22:21 UTC56 points7 comments12 min readLW link(www.neelnanda.io)200 Concrete Open Problems in Mechanistic Interpretability: IntroductionNeel Nanda28 Dec 2022 21:06 UTC106 points0 comments10 min readLW link200 COP in MI: The Case for Analysing Toy Language ModelsNeel Nanda28 Dec 2022 21:07 UTC40 points3 comments7 min readLW link200 COP in MI: Looking for Circuits in the WildNeel Nanda29 Dec 2022 20:59 UTC16 points5 comments13 min readLW link200 COP in MI: Interpreting Algorithmic ProblemsNeel Nanda31 Dec 2022 19:55 UTC33 points2 comments10 min readLW link200 COP in MI: Exploring Polysemanticity and SuperpositionNeel Nanda3 Jan 2023 1:52 UTC34 points6 comments16 min readLW link200 COP in MI: Analysing Training DynamicsNeel Nanda4 Jan 2023 16:08 UTC16 points0 comments14 min readLW link200 COP in MI: Techniques, Tooling and AutomationNeel Nanda6 Jan 2023 15:08 UTC13 points0 comments15 min readLW link200 COP in MI: Image Model InterpretabilityNeel Nanda8 Jan 2023 14:53 UTC18 points3 comments6 min readLW link200 COP in MI: Interpreting Reinforcement LearningNeel Nanda10 Jan 2023 17:37 UTC25 points1 comment10 min readLW link200 COP in MI: Studying Learned Features in Language ModelsNeel Nanda19 Jan 2023 3:48 UTC24 points2 comments30 min readLW link