So if I understand, if you take an eigenvector v with a large eigenvalue in the Hessian, then that corresponds to a feature the network has learned is important for its loss. But more specifically, it corresponds to parameters of the network (i.e. an axis in networkspace) that measure features which correlate in the imagespace.
So the eigenvector v doesn’t give you the features directly in imagespace, it gives you the network parameters which “measure” the feature? I wonder if one could translate this to imagespace. Taking a stab at it, given an image x, definitionally v is the parameter axis that measures the feature, so changes in Θ along v should be proportional to the feature’s presence in x? Ergo, v⋅∇vf(x,Θ+v) should measure the extent to which x exhibits the feature?
Not sure if this is useful, or relevant to your post. Maybe it’s something I should experiment with.
So the eigenvector v doesn’t give you the features directly in imagespace, it gives you the network parameters which “measure” the feature?
Nope, you can straightforwardly read off the feature in imagespace, I think. Remember, the eigenvector doesn’t just show you which parameters “form” the feature through linear combination, it also shows you exactly what that linear combination is. If your eigenvector is (2,0,-3), that means the feature in image space looks like taking the twice the activations of the node connected to Θ1, plus −3 times the activations of Θ3.
Ergo, v⋅∇vf(x,Θ+v) should measure the extent to which x exhibits the feature?
We’re planning to test the connection between the orthogonal features and the actual training data through something similar to this actually, yes. See this comment and the math by John it’s replying to.
Hmm, I suppose in the single-linear-layer case, your way of transferring it to imagespace is equivalent to mine, whereas in the multi-nonlinear-layer case, I am not sure which generalization is the most appropriate.
Your way of doing it basically approximates the network to first order in the parameter changes/second order in the loss function. That’s the same as the method I’m proposing above really, except you’re changing the features to account for the chain rule acting on the layers in front of them. You’re effectively transforming the network into an equivalent one that has a single linear layer, with the entries of ∇vf(x,Θ) as the features.
That’s fine to do when you’re near a global optimum, the case discussed in the main body of this post, and for tiny changes it’ll hold even generally, but for a broader understanding of the dynamics layer by layer, I think insisting on the transformation to imagespace might not be so productive.
Note that imagespace/=thing that is interpretable. You can recognise a dog head detector fine just by looking at its activations, no need to transpose it into imagespace somehow.
(Wait, I say “imagespace” due to thinking too much about image classifiers as the canonical example of a neural network, but of course other inputs can be given to the NN too.)
(And it seems like further one could identify the feature in pixelspace by taking the gradient of v⋅∇vf(x,Θ+v) with respect to the pixels? Might be useful for interpretability? Not sure.)
Hm, this makes me wonder:
So if I understand, if you take an eigenvector v with a large eigenvalue in the Hessian, then that corresponds to a feature the network has learned is important for its loss. But more specifically, it corresponds to parameters of the network (i.e. an axis in networkspace) that measure features which correlate in the imagespace.
So the eigenvector v doesn’t give you the features directly in imagespace, it gives you the network parameters which “measure” the feature? I wonder if one could translate this to imagespace. Taking a stab at it, given an image x, definitionally v is the parameter axis that measures the feature, so changes in Θ along v should be proportional to the feature’s presence in x? Ergo, v⋅∇vf(x,Θ+v) should measure the extent to which x exhibits the feature?
Not sure if this is useful, or relevant to your post. Maybe it’s something I should experiment with.
Nope, you can straightforwardly read off the feature in imagespace, I think. Remember, the eigenvector doesn’t just show you which parameters “form” the feature through linear combination, it also shows you exactly what that linear combination is. If your eigenvector is (2,0,-3), that means the feature in image space looks like taking the twice the activations of the node connected to Θ1, plus −3 times the activations of Θ3.
We’re planning to test the connection between the orthogonal features and the actual training data through something similar to this actually, yes. See this comment and the math by John it’s replying to.
Hmm, I suppose in the single-linear-layer case, your way of transferring it to imagespace is equivalent to mine, whereas in the multi-nonlinear-layer case, I am not sure which generalization is the most appropriate.
Your way of doing it basically approximates the network to first order in the parameter changes/second order in the loss function. That’s the same as the method I’m proposing above really, except you’re changing the features to account for the chain rule acting on the layers in front of them. You’re effectively transforming the network into an equivalent one that has a single linear layer, with the entries of ∇vf(x,Θ) as the features.
That’s fine to do when you’re near a global optimum, the case discussed in the main body of this post, and for tiny changes it’ll hold even generally, but for a broader understanding of the dynamics layer by layer, I think insisting on the transformation to imagespace might not be so productive.
Note that imagespace/=thing that is interpretable. You can recognise a dog head detector fine just by looking at its activations, no need to transpose it into imagespace somehow.
(Wait, I say “imagespace” due to thinking too much about image classifiers as the canonical example of a neural network, but of course other inputs can be given to the NN too.)
(And it seems like further one could identify the feature in pixelspace by taking the gradient of v⋅∇vf(x,Θ+v) with respect to the pixels? Might be useful for interpretability? Not sure.)