Michael Pearce comments on Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs

Michael Pearce 20 Sep 2024 23:18 UTC
2 points
0
On the question of quantizing different feature activations differently: Computing the description length using the entropy of a feature activation’s probability distribution is flexible enough to distinguish different types of distributions. For example, a binary distribution would have a entropy of one bit or less, and distributions spread out over more values would have larger entropies.

In our methodology, the effective float precision matters because it sets the bin width for the histogram of a feature’s activations that is then used to compute the entropy. We used the same effective float precision for all features, which was found by rounding activations to different precisions until the reconstruction or cross-entropy loss is changed by some amount.
- Jacob Dunefsky 30 Sep 2024 5:13 UTC
  1 point
  0
  Parent
  
  Computing the description length using the entropy of a feature activation’s probability distribution is flexible enough to distinguish different types of distributions. For example, a binary distribution would have a entropy of one bit or less, and distributions spread out over more values would have larger entropies.
  
  Yep, that’s completely true. Thanks for the reminder!