On the question of quantizing different feature activations differently: Computing the description length using the entropy of a feature activation’s probability distribution is flexible enough to distinguish different types of distributions. For example, a binary distribution would have a entropy of one bit or less, and distributions spread out over more values would have larger entropies.
In our methodology, the effective float precision matters because it sets the bin width for the histogram of a feature’s activations that is then used to compute the entropy. We used the same effective float precision for all features, which was found by rounding activations to different precisions until the reconstruction or cross-entropy loss is changed by some amount.
A hacky solution might be to look at the top activations using encoder directions AND decoder directions. We can think of the encoder as giving a “specific” meaning and the decoder a “broad” meaning, potentially overlapping other latents. Discrepancies between the two sets of top activations would indicate absorption.
Untied encoders give sparser activations by effectively removing activations that can be better attributed to other latents. So an encoder direction’s top activations can only be understood in the context of all the other latents.
Top activations using the decoder direction would be less sparse but give a fuller picture that is not dependent on what other latents are learned. The activations may be less monosemantic though, especially as you move towards weaker activations.