[Lucius] Identify better SAE sparsity penalties by reasoning about the distribution of feature activations
In sparse coding, one can derive what prior over encoded variables a particular sparsity penalty corresponds to. E.g. an L1 penalty assumes a Laplacian prior over feature activations, while a log(1+a^2) would assume a Cauchy prior. Can we figure out what distribution of feature activations over the data we’d expect, and use this to derive a better sparsity penalty that improves SAE quality?
This is very interesting! What prior does log(1+|a|) correspond to? And what about using ∏i(1+|ai|) instead of ∑ilog(1+|ai|)? Does this only hold if we expect feature activations to be independent (rather than, say, mutually exclusive)?
This is very interesting! What prior does log(1+|a|) correspond to? And what about using ∏i(1+|ai|) instead of ∑ilog(1+|ai|)? Does this only hold if we expect feature activations to be independent (rather than, say, mutually exclusive)?
A prior that doesn’t assume independence should give you a sparsity penalty that isn’t a sum of independent penalties for each activation.