James Payor comments on Classifying representations of sparse autoencoders (SAEs)

James Payor 17 Nov 2023 16:53 UTC
1 point
0
Your graphs are labelled with “test accuracy”, do you also have some training graphs you could share?
I’m specifically wondering if your train accuracy was high for both the original and encoded activations, or if e.g. the regression done over the encoded features saturated at a lower training loss.
- Annah 17 Nov 2023 20:13 UTC
  2 points
  0
  Parent
  The relative difference in the train accuracies looks pretty similar. But yeah, @SenR already pointed to the low number of active features in the SAE, so that explains this nicely.