I know that others are keen to have a suite of SAEs at different resolutions; my (possibly controversial) instinct is that we should be looking for a single SAE which we feel appropriately captures the properties we want. Then if we’re wanting something more coarse-grained for a different level of analysis maybe we should use a nice hierarchical SAE representation in a single SAE (as above)...
This seems reasonable enough to me. For what it’s worth, the other main reason why I’m particularly interested in whether different SAEs’ rate-distortion curves intersect is because if this is the case, then comparing two SAEs becomes more difficult: depending on the bitrate that you’re evaluating at, SAE A might be better than SAE B or vice versa. On the other hand, if SAE A’s rate-distortion curve is always above SAE B, then it means that the answer to “which SAE is better?” doesn’t depend on any hyperparameter (bitrate, or conversely, acceptable loss threshold). I imagine that the former case is probably true, in which case heuristics for acceptable loss thresholds or reasonable bitrates will probably be developed. But it’d be really nice if the latter case turned out to be true, so I’m personally curious to see whether it is.
This is very cool work!
One question that I have is whether JSAEs still work as well on models trained with gated MLP activation functions (e.g. ReGLU, SwiGLU). I ask this because there is evidence suggesting that transcoders don’t work as well on such models (see App. B of the Gemmascope paper; I also have some unpublished results that I’m planning to write up that further corroborate this). It thus might be the case that the same greater representational capacity of gated activation functions causes both transcoders and JSAEs to be unable to learn sparse input-output mappings. (If both JSAEs and transcoders perform worse on gated activation functions, then I think that would indicate that there’s something “weird” about these activation functions that should be studied further.)