Thane Ruthenis comments on The “Minimal Latents” Approach to Natural Abstractions

Thane Ruthenis Dec 20, 2022, 7:44 AM
LW: 5 AF: 3
0
AF
This touches on some issues I’d wanted to discuss: abstraction hierarchies, and incompatible abstraction layers.
So, here’s a new conditional independence condition for “large” systems, i.e. systems with an infinite number of $X_{i}$ ’s: given $Λ$ , any finite subset of the $X_{i}$ ’s must be approximately independent (i.e. mutual information below some small $ϵ$ ) of all but a finite number of the other $X_{i}$ ’s
Suppose we have a number of tree-instances $X_{1}, X_{2}, . . ., X_{n}$ . Given a sufficiently large $ϵ$ , we can compute a valid “general tree abstraction”. But what if we’ve picked a lower $ϵ$ , and are really committed to keeping it low, for some reason?
Here’s a trick:
We separate tree-instances into sets $S_{1}, S_{2}, . . ., S_{m}$ such that we can compute the corresponding “first-order” abstractions $Λ_{1}, Λ_{2}, . . ., Λ_{m}$ over each set, and they would be valid, in the sense that any two $X_{i}, X_{j} \in S_{k}$ would have mutual information below $ϵ$ when conditioned on $Λ_{k}$ ^[1]. Plausibly, that would recover a set of abstractions corresponding to “tree species”.
Then we repeat the trick: split the first-order abstractions $Λ_{1}, Λ_{2}, . . ., Λ_{m}$ into sets, and generate second-order abstractions $Λ_{1}^{II}, Λ_{2}^{II}, . . ., Λ_{q}^{II}$ . That may recover, say, genuses.
We do this iteratively until getting a single nth-order abstraction $Λ^{Ω}$ , standing-in for “all trees”.
I think it would all have sensible behavior. Conditioning any given tree-instance $X_{i}$ on $Λ^{Ω}$ would only explain general facts about the trees, as we wanted. Conditioning on the appropriate lower-level abstractions would explain progressively more information about $X_{i}$ . Conditioning a $X_{i} \notin S_{j}$ on $Λ_{j}^{I}$ , in turn, would turn up some information that’s in excess, or make some wrong predictions, but get the general facts right. (And you can also condition first-order abstractions on higher-order abstractions, etc.)
The question is: how do we pick $ϵ$ ? One potential answer is that, given some set of instances $X_{1}, X_{2}, . . ., X_{n}$ , we always try for the lowest $ϵ$ possible^[1]. Perhaps that’s the mathematical description of taxonomy, even? “Given a set of instances, generate the abstraction hierarchy that minimizes $ϵ$ at each abstraction-level.”
There’s a different way to go about it, though. Suppose that, instead of picking $ϵ$ and then deciding on groupings, we first split instances $X_{1}, X_{2}, . . ., X_{n}$ into sets, according to some rule? We have to be able to do that: we’ve somehow decided to abstract over these specific $X_{1}, X_{2}, . . ., X_{n}$ to begin with, so we already have some way to generate groupings. (We’ve somehow arrived at a set of tree-instances to abstract over, instead of a mixture of cars, trees, towels, random objects...)
So, we pick some “rule”, which is likely a natural abstraction in itself, or defined over one. Like “trees that are N years old” with separate set for every N, or “this tree has leaves” y/n, or “trees in %person%’s backyard” for every %person%. Then we split the instances into sets according to that rule, and try to summarize every set.
Important: that way, we may get meaningfully different $ϵ$ s for every set! For example, suppose we cluster trees by whose backyard they’re in.
- Person A has trees of several different species growing in their yard. For them, we compute $S_{A}$ , the corresponding abstraction/summary $Φ_{A}$ ^[2], and some $ϵ_{A}$ that makes $Φ_{A}$ be a valid abstraction.
- Person B only plants trees of a single species. Again, we compute $S_{B}$ , $Φ_{B}$ , $ϵ_{B}$ .
- Obviously, $ϵ_{A} > ϵ_{B}$ .
What does this approach yield us?
- It’s a tool of analysis. We can try different rules on for size, and see if that reveals any interesting data. (Do most people grow only trees of a single species in their yard?)
- It’s potentially useful for general-purpose search via constraints. Consider two different first-order abstractions, “trees of species z” $Λ_{z}$ and trees-in-my-backyard $Φ_{my}$ . Computing the second-order abstraction from them would be rather arbitrary, but it’s something we may want to do during a specific planning process!
  - (Though note that combining any two nth-order abstractions would result in a (n+1)th-order abstraction that has at least as much information as $Λ^{Ω}$ . I. e., any given valid abstraction hierarchy over a given set of instances terminates in the same max-level abstraction. I’m not sure if that’s useful.)
- It allows abstraction layers, as outlined below.
Consider humans, geopolitical entities, and ideological movements. They don’t have a clear hierarchy: while humans are what constitutes the latter two “layers”, ideological movements are not split across geopolitical lines (same ideologies can be present in different countries), and geopolitical entities are not split along ideological lines (a given government can have multiple competing ideologies). By implication, once you’re viewing the world in terms of ideologies, you can’t recover governments from this data; nor vice versa.
Similarly: As we’ve established, we can split trees by species $Λ_{1}^{I}, . . ., Λ_{n}^{I}$ and by “whose backyard they’re in” $Φ_{1}^{I}, . . ., Φ_{g}^{I}$ . But: we would not be able to recover genuses $Λ_{1}^{II}, . . ., Λ_{d}^{II}$ from the backyard-data $Φ_{1}^{I}, . . ., Φ_{g}^{I}$ ! Once we’ve committed to the backyard-classification, we’ve closed-off species-classification!
I propose calling such incompatible abstraction hierarchies abstraction layers. Behind every abstraction layer, there’s some rule by which we’re splitting instances into sets, and such rules are/are-defined-over natural abstractions, in turn.
Does all that make sense, on your model?
1. ^
  And, I guess, such that there’s at least one set with more than one instance, to forbid the uninteresting trivial case where there’s a one-member set for every initial instance. More generally, we’d want the number of sets to be “small” compared to the number of instances, in some sense of “small”.
2. ^
  Reason for the change in notation from $Λ$ will be apparent later.
3. ^
  Or maybe it’s still useful, for general-purpose search via constraints?
What links here?
- World-Model Interpretability Is All We Need by Thane Ruthenis (Jan 14, 2023, 7:37 PM; 35 points)