This implies that there is no elephant direction separate from the attributes that happen to commonly co-occur with elephants. E.g. it’s not possible to represent an elephant with any arbitrary combination of attributes, as the attributes themselves are what defines the elephant direction. This is what I mean that the attributes are the ‘base units’ in this scheme, and ‘animals’ are just commonly co-occurring sets of attributes. This is the same as the “red triangle” problem in SAEs: https://www.lesswrong.com/posts/QoR8noAB3Mp2KBA4B/do-sparse-autoencoders-find-true-features. The animals in this framing are just invented combinations of the underlying attribute features. We would want the dictionary to learn the attributes, not arbitrary combinations of attributes, since these are the true “base units” that can vary freely. e.g. in the “red triangle” problem, we want a dictionary to learn “red” and “triangle”, not “red triangle” as its own direction.
Put another way, there’s no way to represent an “elephant” in this scheme without also attaching attributes to it. Likewise, it’s not possible to differentiate between an elephant with the set of attributes x y and z and a rabbit with identical attributes x y and z, since the sum of attributes are what you’re calling an elephant or rabbit. There’s no separate “this is a rabbit, regardless of what attributes it has” direction.
To properly represent animals and attributes, there needs to be a direction for each animal that’s separate from any attributes that animal may have, so that it’s possible to represent a “tiny furry pink elephant with no trunk” vs a “tiny furry pink rabbit with no trunk”.
Thank you for writing this up! I experimented briefly with group sparsity as well, but with the goal of learning the “hierarchy” of features rather than to learn circular features like you’re doing here. I also struggled to get it to work in toy settings, but didn’t try extensively and ended up moving on to other things. I still think there must be something in group sparsity, since it’s so well studied in sparse coding and clearly does work in theory.
I also struggled with the problem of how to choose groups, since for traditional group sparsity you need to set the groups before-hand. I like your idea of trying to learn the group space. For using group sparsity to recover hierarchy, I wonder if there’s a way to learn a direction for the group as a whole, and project out that direction from each member of the group. The idea would be that if latents are sharing common components, those common components should probably be their own “group” representation, and this should be done until the leaf nodes are mostly orthogonal to each other. There are definitely overlapping hierarchies too, which is a challenge.
Regardless, thank you for sharing this! There’s a lot of great ideas in this post.