That’s correct. ‘Correlated features’ could ambiguously mean “Feature x tends to activate when feature y activates” OR “When we generate feature direction x, its distribution is correlated with feature y’s”. I don’t know if both happen in LMs. The former almost certainly does. The second doesn’t really make sense in the context of LMs since features are learned, not sampled from a distribution.
That’s correct. ‘Correlated features’ could ambiguously mean “Feature x tends to activate when feature y activates” OR “When we generate feature direction x, its distribution is correlated with feature y’s”. I don’t know if both happen in LMs. The former almost certainly does. The second doesn’t really make sense in the context of LMs since features are learned, not sampled from a distribution.