Belrose et al found that the tuned lens is generally superior to the logit lens. Would the results change if the tuned lens were used here? My guess is probably not, since in the paper there is little difference when applying the two techniques to later layers, but maybe it’s worth a try.
Yes, tuned lens is an excellent tool and generally superior to the original logit lens. In this particular case, I don’t think it would show very different results, however (and in any case the logit lens is only a small part of the analysis), but I think it would be interesting to have some kind of integration with TransformerLens that enabled the training and usage of tuned lens as well.
Belrose et al found that the tuned lens is generally superior to the logit lens. Would the results change if the tuned lens were used here? My guess is probably not, since in the paper there is little difference when applying the two techniques to later layers, but maybe it’s worth a try.
Yes, tuned lens is an excellent tool and generally superior to the original logit lens. In this particular case, I don’t think it would show very different results, however (and in any case the logit lens is only a small part of the analysis), but I think it would be interesting to have some kind of integration with TransformerLens that enabled the training and usage of tuned lens as well.