johnswentworth comments on Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

johnswentworth 18 Apr 2024 15:53 UTC
6 points
2
How sure are we that models will keeptracking Bayesian belief states, and so allow this inverse reasoning to be used, when they don’t have enough space and compute to actually track a distribution over latent states?
One obvious guess there would be that the factorization structure is exploited, e.g. independence and especially conditional independence/DAG structure. And then a big question is how distributions of conditionally independent latents in particular end up embedded.
- Bogdan Ionut Cirstea 18 Apr 2024 18:04 UTC
  11 points
  0
  Parent
  One obvious guess there would be that the factorization structure is exploited, e.g. independence and especially conditional independence/DAG structure. And then a big question is how distributions of conditionally independent latents in particular end up embedded.
  There are some theoretical reasons to expect linear representations for variables which are causally separable / independent. See recent work from Victor Veitch’s group, e.g. Concept Algebra for (Score-Based) Text-Controlled Generative Models, The Linear Representation Hypothesis and the Geometry of Large Language Models, On the Origins of Linear Representations in Large Language Models.
  Separately, there are theoretical reasons to expect convergence to approximate causal models of the data generating process, e.g. Robust agents learn causal world models.
  Linearity might also make it (provably) easier to find the concepts, see Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models.
- Lucius Bushnaq 18 Apr 2024 18:30 UTC
  3 points
  0
  Parent
  Right. If I have $n$ fully independent latent variables that suffice to describe the state of the system, each of which can be in one of $s$ different states, then even tracking the probability of every state for every latent with a $p$ bit precision float will only take me about $n \times s \times p$ bits. That’s actually not that bad compared to $n \times log (s)$ for just tracking some max likelihood guess.