Something I’m curious about from anyone involved with the challenge—are there any standard tools to perform techinques like ablation on machine learning models developed in common frameworks, especially Pytorch? I often get the sense that the differential in development between capabilities and interpretability is reflected first and foremost in the tools available—plenty of libraries to design and train models relatively easily, but little in the way of interpretation unless you are able and willing to make your own tools. Am I wrong?
There are existing tools like lucid/lucent, captum, transformerlens, and many others that make it easy to use certain types of interpretability tools. But there is no standard, broad interpretability coding toolkit. Given the large number of interpretability tools and how quickly methods become obsolete, I don’t expect one.
Something I’m curious about from anyone involved with the challenge—are there any standard tools to perform techinques like ablation on machine learning models developed in common frameworks, especially Pytorch? I often get the sense that the differential in development between capabilities and interpretability is reflected first and foremost in the tools available—plenty of libraries to design and train models relatively easily, but little in the way of interpretation unless you are able and willing to make your own tools. Am I wrong?
There are existing tools like lucid/lucent, captum, transformerlens, and many others that make it easy to use certain types of interpretability tools. But there is no standard, broad interpretability coding toolkit. Given the large number of interpretability tools and how quickly methods become obsolete, I don’t expect one.