Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 27 Jul 2024 22:48 UTC
1 point
0
[Note] On the feature geometry of hierarchical concepts
A rough summary of insightful discussions with Jake Mendel and Victor Veitch
Recent work on hierarchical feature geometry has made two specific predictions:
- Proposition 1: activation space can be decomposed hierarchically into a direct sum of many subspaces, each of which reflects a layer of the hierarchy.
- Proposition 2: within these subspaces, different concepts are represented as simplices.
Example of hierarchical decomposition: A dalmation is a dog, which is a mammal, which is an animal. Writing this hierarchically, Dalmation < Dog < Mammal < Animal. In this context, the two propositions imply that:
- P1: $x_{dog} = x_{animal} + x_{mammal | animal} + x_{dog | mammal } + x_{dalmation | dog}$, and the four terms on the RHS are pairwise orthogonal.
- P2: If we had a few different kinds of animal, like birds, mammals, and fish, the three vectors $x_{mammal | animal}, x_{fish | animal}, x_{bird | animal}$ would form a simplex.
According to Victor Veitch, the load-bearing assumption here is that different levels of the hierarchy are disentangled, and hence models want to represent them orthogonally. I.e. $x_{animal}$ is perpendicular to $x_{mammal | animal}$. I don’t have a super rigorous explanation for why, but it’s likely because this facilitates representing / sensing each thing independently.
- E.g. sometimes all that matters about a dog is that it’s an animal; it makes sense to have an abstraction of “animal” that is independent of any sub-hierarchy.
Jake Mendel made the interesting point that, as long as the number of features is less than the number of dimensions, an orthogonal set of vectors will satisfy P1 and P2 for any hierarchy.
Example of P2 being satisfied. Let’s say we have vectors $x_{animal} = (0,1)$ and $x_{plant} = (1,0)$, which are orthogonal. Then we could write $x_{living_thing} = (1/sqrt(2), 1/ sqrt(2))$. Then $x_{animal | living_thing}, x_{plant | living_thing}$ would form a 1-dimensional simplex.
Example of P1 being satisfied. Let’s say we have four things A, B, C, D arranged in a binary tree such that AB, CD are pairs. Then we could write $x_A = x_{AB} + x_{A | AB}$, satisfying both P1 and P2. However, if we had an alternate hierarchy where AC and BD were pairs, we could still write $x_A = x_{AC} + x_{A | AC}$. Therefore hierarchy is in some sense an “illusion”, as any hierarchy satisfies the propositions.
Taking these two points together, the interesting scenario is when we have more features than dimensions, i.e. the setting of superposition. Then we have the two conflicting incentives:
- On one hand, models want to represent the different levels of the hierarchy orthogonally.
- On the other hand, there isn’t enough “room” in the residual stream to do this; hence the model has to “trade off” what it chooses to represent orthogonally.
This points to super interesting questions:
- what geometry does the model adopt for features that respect a binary tree hierarchy?
- what if different nodes in the hierarchy have differing importances / sparsities?
- what if the tree is “uneven”, i.e. some branches are deeper than others.
- what if the hierarchy isn’t a tree, but only a partial order?
Experiments on toy models will probably be very informative here.

Daniel Tan comments on Daniel Tan’s Shortform

[Note] On the feature geometry of hierarchical concepts