It would be great to know if they aren’t, as it affects how we estimate the number of features and subsequently the SAE expansion factor.
My impression from people working on SAEs is that the optimal number of features is very much an open question. In Toward Monosemanticity they observe that different numbers of features work fine; you just get feature splitting / collapse as you go bigger / smaller.
The scaling laws are not mere empirical observations
This seems like a strong claim; are you aware of arguments or evidence for it? My impression (not at all strongly held) was that it’s seen as a useful rule of thumb that may or may not continue to hold.
My impression from people working on SAEs is that the optimal number of features is very much an open question. In Toward Monosemanticity they observe that different numbers of features work fine; you just get feature splitting / collapse as you go bigger / smaller.
This seems like a strong claim; are you aware of arguments or evidence for it? My impression (not at all strongly held) was that it’s seen as a useful rule of thumb that may or may not continue to hold.