Authors: Pantelis Vafidis, Aman Bhargava, Antonio Rangel.
Abstract:
Intelligent perception and interaction with the world hinges on internal representations that capture its underlying structure (“disentangled” or “abstract” representations). Disentangled representations serve as world models, isolating latent factors of variation in the world along orthogonal directions, thus facilitating feature-based generalization. We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve multi-task evidence aggregation classification tasks, canonical in the cognitive neuroscience literature. The key conceptual finding is that, by producing accurate multi-task classification estimates, a system implicitly represents a set of coordinates specifying a disentangled representation of the underlying latent state of the data it receives. The theory provides conditions for the emergence of these representations in terms of noise, number of tasks, and evidence aggregation time. We experimentally validate these predictions in RNNs trained on multi-task classification, which learn disentangled representations in the form of continuous attractors, leading to zero-shot out-of-distribution (OOD) generalization in predicting latent factors. We demonstrate the robustness of our framework across autoregressive architectures, decision boundary geometries and in tasks requiring classification confidence estimation. We find that transformers are particularly suited for disentangling representations, which might explain their unique world understanding abilities. Overall, our framework puts forth parallel processing as a general principle for the formation of cognitive maps that capture the structure of the world in both biological and artificial systems, and helps explain why ANNs often arrive at human-interpretable concepts, and how they both may acquire exceptional zero-shot generalization capabilities.
If correct, this looks like an important theoretical advance in understanding why and under what conditions neural nets can generalize outside their training distribution.