Thomas Kwa comments on Exploratory Analysis of RLHF Transformers with TransformerLens

Thomas Kwa 6 Apr 2023 1:03 UTC
2 points
0
Belrose et al found that the tuned lens is generally superior to the logit lens. Would the results change if the tuned lens were used here? My guess is probably not, since in the paper there is little difference when applying the two techniques to later layers, but maybe it’s worth a try.
- Curt Tigges 7 Apr 2023 16:35 UTC
  1 point
  0
  Parent
  Yes, tuned lens is an excellent tool and generally superior to the original logit lens. In this particular case, I don’t think it would show very different results, however (and in any case the logit lens is only a small part of the analysis), but I think it would be interesting to have some kind of integration with TransformerLens that enabled the training and usage of tuned lens as well.