MiguelDev comments on AutoInterpretation Finds Sparse Coding Beats Alternatives

MiguelDev 17 Jul 2023 1:55 UTC
0 points
0
Illustrating the way in which many residual stream basis directions have high correlation when considering top and random fragments together, but low to zero correlation when looking within a fragment.

I think the final layer’s output determines the complexity of the prediction that will be used to determine the vocab library that the model will generate its output from. I would vote on the random fragment chart (1st image) here.