Hoagy comments on Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions

Hoagy 22 Jul 2024 16:00 UTC
LW: 1 AF: 1
0
AF
Super interesting! Have you checked whether the average of N SAE features looks different to an SAE feature? Seems possible they live in an interesting subspace without the particular direction being meaningful.

Also really curious what the scaling factors are for computing these values are, in terms of the size of the dense vector and the overall model?