dr_s comments on Takeaways from the Mechanistic Interpretability Challenges

dr_s 9 Jun 2023 16:11 UTC
9 points
0
Something I’m curious about from anyone involved with the challenge—are there any standard tools to perform techinques like ablation on machine learning models developed in common frameworks, especially Pytorch? I often get the sense that the differential in development between capabilities and interpretability is reflected first and foremost in the tools available—plenty of libraries to design and train models relatively easily, but little in the way of interpretation unless you are able and willing to make your own tools. Am I wrong?
- scasper 10 Jun 2023 18:53 UTC
  1 point
  0
  Parent
  There are existing tools like lucid/lucent, captum, transformerlens, and many others that make it easy to use certain types of interpretability tools. But there is no standard, broad interpretability coding toolkit. Given the large number of interpretability tools and how quickly methods become obsolete, I don’t expect one.