Fengyuan Hu

Karma: 21

Fengyuan Hu Apr 12, 2024, 1:32 AM
1 point
0
in reply to: Glen Taggart’s comment on: Normalizing Sparse Autoencoders
The additional experiment under Experiment-Performance Verification (Figure 11) compares normalized_1 and baseline_1 on layer 5 which have almost identical $L_{0}$ . The result showed no observable difference.

Fengyuan Hu Apr 11, 2024, 12:14 AM
2 points
1
in reply to: Joseph Miller’s comment on: Normalizing Sparse Autoencoders
I don’t think $L_{r e c o n s t r u c t i o n}$ is very informative here, as it’s highly impacted by the input batch. Both the raw $L_{r e c o n s t r u c t i o n}$ and $L_{c l e a n}$ have large variances at different verification steps, and since we mainly care about how good our reconstruction is compared with the original, I think the reconstruction score is good as is. I also don’t follow why the noisiness of $L_{0}$ leads to showing $L_{r e c o n s t r u c t i o n}$ .

Fengyuan Hu Apr 10, 2024, 4:54 PM
1 point
0
in reply to: Logan Riggs’s comment on: Normalizing Sparse Autoencoders
Good point. Firstly, the mean L0 between the experiment and the baseline is within a scaling factor of 2, so it’s in a reasonably close range. I also added a new set of figures comparing the reconstruction score of one layer that have the closest match on L0 between the experiment group. Spoiler, the scores are still almost the same at the end of training. You can find it under Experiments-Performance Validation.

Fengyuan Hu Apr 8, 2024, 8:50 PM
1 point
0
in reply to: Joseph Miller’s comment on: Normalizing Sparse Autoencoders
Oh I see. I’ll have to look into that cuz I used the AI-safety-foundation’s implementation and they don’t measure the KL divergence. That said, there is a validation metric called reconstruction score that measures how replacing activations change the total loss of the model, and the scores are pretty similar for the original and normalized.

Fengyuan Hu Apr 8, 2024, 2:08 PM
1 point
0
in reply to: Joseph Miller’s comment on: Normalizing Sparse Autoencoders
You can treat figure 7 as comparing the L0, and Figure 13 as comparing L2.

Fengyuan Hu Apr 8, 2024, 2:04 PM
1 point
0
in reply to: Joseph Miller’s comment on: Normalizing Sparse Autoencoders
It is a metric from the ai-safety-foundation’s implementation. It seems to measure the number of neurons in the feature activation that fires more than a threshold. At least that’s my interpretation.

Normalizing Sparse Autoencoders

Fengyuan HuApr 8, 2024, 6:17 AM

21 points

Fengyuan Hu Apr 7, 2024, 2:20 AM
1 point
0
on: Fixing Feature Suppression in SAEs
Thanks for your amazing work! Theoretically I think that layers with higher input norms should have lower SAE L2 ratios, as they corresponds to higher feature activations that are penalized heavier. I wonder if your data confirms this hypothesis.