Why is loss stickiness deprecated? Were you just not able to see the an overlap in basins for L1 & reconstruction loss when you 4x the feature/neuron ratio (ie from 2x->8x)?
No theoretical reason—The method we used in the Interim Report to combine the two losses into one metric was pretty cursed. It’s probably just better to use L1 loss alone and reconstruction loss alone and then combine the findings. But having plots for both losses would have added more plots without much gain for the presentation. It also just seemed like the method that was hardest to discern the difference between full recovery and partial recovery because the differences were kind of subtle. In future work, some way to use the losses to measure feature recover will probably be re-introduced. It probably just won’t be the way we used in the interim report.
Why is loss stickiness deprecated? Were you just not able to see the an overlap in basins for L1 & reconstruction loss when you 4x the feature/neuron ratio (ie from 2x->8x)?
No theoretical reason—The method we used in the Interim Report to combine the two losses into one metric was pretty cursed. It’s probably just better to use L1 loss alone and reconstruction loss alone and then combine the findings. But having plots for both losses would have added more plots without much gain for the presentation. It also just seemed like the method that was hardest to discern the difference between full recovery and partial recovery because the differences were kind of subtle. In future work, some way to use the losses to measure feature recover will probably be re-introduced. It probably just won’t be the way we used in the interim report.