Authors: Huzheng Yang, James Gee, Jianbo Shi.
Abstract:
We study the intriguing connection between visual data, deep networks, and the brain. Our method creates a universal channel alignment by using brain voxel fMRI response prediction as the training objective. We discover that deep networks, trained with different objectives, share common feature channels across various models. These channels can be clustered into recurring sets, corresponding to distinct brain regions, indicating the formation of visual concepts. Tracing the clusters of channel responses onto the images, we see semantically meaningful object segments emerge, even without any supervised decoder. Furthermore, the universal feature alignment and the clustering of channels produce a picture and quantification of how visual information is processed through the different network layers, which produces precise comparisons between the networks.
That’s a fascinating idea. Using the human brain voxel maps as guidance would presumably also be possible for text as they did for images, and seems like it would help us assess how human-like the ontology and internal workings of a model are and to what extent the natural abstractions hypothesis is true, at least for LLMs.
Combining and comparing this to VAEs might also be very illuminating.
Alternatively, for less costly to acquire guidance than the human brain, how about picking a (large) reference model and attempting to use (smaller) models to predict its activations across layers at some granularity?