Jacob Dunefsky comments on Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs

Jacob Dunefsky 30 Sep 2024 5:13 UTC
1 point
0

Computing the description length using the entropy of a feature activation’s probability distribution is flexible enough to distinguish different types of distributions. For example, a binary distribution would have a entropy of one bit or less, and distributions spread out over more values would have larger entropies.

Yep, that’s completely true. Thanks for the reminder!