Bogdan Ionut Cirstea comments on Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

Bogdan Ionut Cirstea 18 Apr 2024 18:04 UTC
11 points
0
One obvious guess there would be that the factorization structure is exploited, e.g. independence and especially conditional independence/DAG structure. And then a big question is how distributions of conditionally independent latents in particular end up embedded.
There are some theoretical reasons to expect linear representations for variables which are causally separable / independent. See recent work from Victor Veitch’s group, e.g. Concept Algebra for (Score-Based) Text-Controlled Generative Models, The Linear Representation Hypothesis and the Geometry of Large Language Models, On the Origins of Linear Representations in Large Language Models.
Separately, there are theoretical reasons to expect convergence to approximate causal models of the data generating process, e.g. Robust agents learn causal world models.
Linearity might also make it (provably) easier to find the concepts, see Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models.