So the eigenvector v doesn’t give you the features directly in imagespace, it gives you the network parameters which “measure” the feature?
Nope, you can straightforwardly read off the feature in imagespace, I think. Remember, the eigenvector doesn’t just show you which parameters “form” the feature through linear combination, it also shows you exactly what that linear combination is. If your eigenvector is (2,0,-3), that means the feature in image space looks like taking the twice the activations of the node connected to Θ1, plus −3 times the activations of Θ3.
Ergo, v⋅∇vf(x,Θ+v) should measure the extent to which x exhibits the feature?
We’re planning to test the connection between the orthogonal features and the actual training data through something similar to this actually, yes. See this comment and the math by John it’s replying to.
Hmm, I suppose in the single-linear-layer case, your way of transferring it to imagespace is equivalent to mine, whereas in the multi-nonlinear-layer case, I am not sure which generalization is the most appropriate.
Your way of doing it basically approximates the network to first order in the parameter changes/second order in the loss function. That’s the same as the method I’m proposing above really, except you’re changing the features to account for the chain rule acting on the layers in front of them. You’re effectively transforming the network into an equivalent one that has a single linear layer, with the entries of ∇vf(x,Θ) as the features.
That’s fine to do when you’re near a global optimum, the case discussed in the main body of this post, and for tiny changes it’ll hold even generally, but for a broader understanding of the dynamics layer by layer, I think insisting on the transformation to imagespace might not be so productive.
Note that imagespace/=thing that is interpretable. You can recognise a dog head detector fine just by looking at its activations, no need to transpose it into imagespace somehow.
Nope, you can straightforwardly read off the feature in imagespace, I think. Remember, the eigenvector doesn’t just show you which parameters “form” the feature through linear combination, it also shows you exactly what that linear combination is. If your eigenvector is (2,0,-3), that means the feature in image space looks like taking the twice the activations of the node connected to Θ1, plus −3 times the activations of Θ3.
We’re planning to test the connection between the orthogonal features and the actual training data through something similar to this actually, yes. See this comment and the math by John it’s replying to.
Hmm, I suppose in the single-linear-layer case, your way of transferring it to imagespace is equivalent to mine, whereas in the multi-nonlinear-layer case, I am not sure which generalization is the most appropriate.
Your way of doing it basically approximates the network to first order in the parameter changes/second order in the loss function. That’s the same as the method I’m proposing above really, except you’re changing the features to account for the chain rule acting on the layers in front of them. You’re effectively transforming the network into an equivalent one that has a single linear layer, with the entries of ∇vf(x,Θ) as the features.
That’s fine to do when you’re near a global optimum, the case discussed in the main body of this post, and for tiny changes it’ll hold even generally, but for a broader understanding of the dynamics layer by layer, I think insisting on the transformation to imagespace might not be so productive.
Note that imagespace/=thing that is interpretable. You can recognise a dog head detector fine just by looking at its activations, no need to transpose it into imagespace somehow.