I greatly appreciated the time invested in coding the interactive demos, they help clarifying the insights into the underlying concepts – it reminds me of Colah’s posts.
Questions:
Are you going to release tools for the interpretation of other models?
How might one visualize other modalities? like audio or web actions?
Have you considered developing a generalized interpretability framework that could scale these techniques across different architectures and modalities? A unified “interpretability platform” could help broaden access and grow a dedicated community around your work.
Right now, there’s a lot to exploit with CLIP and ViTs so that will be the focus for awhile. We may expand to Flamingo or other models if there is demand.
Other modalities would be fascinating. I imagine they have their own idiosyncrasies. I would be interested in audio in the future but not at the expense of first exploiting vision.
Ideally, yes; a unified interp framework for any modality is the north star. I do think this will be a community effort. Research in language built off findings from many different groups and institutions. Vision and other modalities are currently just not in the same place.
I greatly appreciated the time invested in coding the interactive demos, they help clarifying the insights into the underlying concepts – it reminds me of Colah’s posts.
Questions:
Are you going to release tools for the interpretation of other models?
How might one visualize other modalities? like audio or web actions?
Have you considered developing a generalized interpretability framework that could scale these techniques across different architectures and modalities? A unified “interpretability platform” could help broaden access and grow a dedicated community around your work.
Right now, there’s a lot to exploit with CLIP and ViTs so that will be the focus for awhile. We may expand to Flamingo or other models if there is demand.
Other modalities would be fascinating. I imagine they have their own idiosyncrasies. I would be interested in audio in the future but not at the expense of first exploiting vision.
Ideally, yes; a unified interp framework for any modality is the north star. I do think this will be a community effort. Research in language built off findings from many different groups and institutions. Vision and other modalities are currently just not in the same place.