Jason Gross comments on Sparsify: A mechanistic interpretability research agenda

Jason Gross 5 Apr 2024 17:22 UTC
2 points
0
Choosing better sparsity penalties than L1 (Upcoming post - Ben Wright & Lee Sharkey): [...] We propose a simple fix: Use $L_{0 < p < 1}$ instead of $L_{1}$ , which seems to be a Pareto improvement over $L_{1}$
Is there any particular justification for using $L_{p}$ rather than, e.g., tanh (cf Anthropic’s Feb update), log1psum (acts.log1p().sum()), or prod1p (acts.log1p().sum().exp())? The agenda I’m pursuing (write-up in progress) gives theoretical justification for a sparsity penalty that explodes combinatorially in the number of active features, in any case where the downstream computation performed over the feature does not distribute linearly over features. The product-based sparsity penalty seems to perform a bit better than both $L_{0.5}$ and tanh on a toy example (sample size 1), see this colab.