I did try it on a simple MNIST classifier. The main result was that all effects were dominated by a handful of missclassified or barely-correctly-classified data points, and the phenomenon I originally hypothesized just wasn’t super relevant.
Ah, I had been thinking that this method would weight these sorts of data points highly, but I wasn’t sure how critical it would be. I’ve assumed it would be possible to reweight things to focus on a better distribution of data points, because it seems like there would be some very mathematically natural ways of doing this reweighting. Is this something you’ve experimented with?
… I suppose it may make more sense to do this reweighting for my purposes than for yours.
Since then, I’ve also tried a different kind of experiment to translate interpretable features across nets, this time on a simple generative model. Basically, the experiment just directly applied the natural abstraction hypothesis to the image-distributions produced by nets trained on the same data (using a first-order approximation).
When you say “directly applied”, what do you mean?
That one worked a lot better, but didn’t really connect to peak breadth or even say much about network internals in general.
Saying much about network internals seems difficult as ever. I get the impression that these methods can’t really do it, due to being too local; they can say something about how the network behaves on the data manifold, but networks that are internally very different can behave the same on the data manifold, and so these methods can’t really distinguish those networks.
Meta: I’m going through a backlog of comments I never got around to answering. Sorry it took three months.
I’ve assumed it would be possible to reweight things to focus on a better distribution of data points, because it seems like there would be some very mathematically natural ways of doing this reweighting. Is this something you’ve experimented with?
Something along those lines might work; I didn’t spend much time on it before moving to a generative model.
When you say “directly applied”, what do you mean?
The actual main thing I did was to compute the SVD of the jacobian of a generative network output (i.e. the image) with respect to input (i.e. the latent vector). Results of interest:
Conceptually, near-0 singular values indicate a direction-in-image-space in which no latent parameter change will move the image—i.e. locally-inaccessible directions. Conversely, large singular values indicate “degrees of freedom” in the image. Relevant result: if I take two different trained generative nets, and find latents for each such that they both output approximately the same image, then they both roughly agree on what directions-in-image-space are local degrees of freedom.
By taking the SVD of the jacobian of a chunk of the image with respect to the latent, we can figure out which directions-in-latent-space that chunk of image is locally sensitive to. And then, a rough local version of the natural abstraction hypothesis would say that nonadjacent chunks of image should strongly depend on the same small number of directions-in-latent-space, and be “locally independent” (i.e. not highly sensitive to the same directions-in-latent-space) given those few. And that was basically correct.
To be clear, this was all “rough heuristic testing”, not really testing predictions carefully derived from the natural abstraction framework.
Ah, I had been thinking that this method would weight these sorts of data points highly, but I wasn’t sure how critical it would be. I’ve assumed it would be possible to reweight things to focus on a better distribution of data points, because it seems like there would be some very mathematically natural ways of doing this reweighting. Is this something you’ve experimented with?
… I suppose it may make more sense to do this reweighting for my purposes than for yours.
When you say “directly applied”, what do you mean?
Saying much about network internals seems difficult as ever. I get the impression that these methods can’t really do it, due to being too local; they can say something about how the network behaves on the data manifold, but networks that are internally very different can behave the same on the data manifold, and so these methods can’t really distinguish those networks.
Meta: I’m going through a backlog of comments I never got around to answering. Sorry it took three months.
Something along those lines might work; I didn’t spend much time on it before moving to a generative model.
The actual main thing I did was to compute the SVD of the jacobian of a generative network output (i.e. the image) with respect to input (i.e. the latent vector). Results of interest:
Conceptually, near-0 singular values indicate a direction-in-image-space in which no latent parameter change will move the image—i.e. locally-inaccessible directions. Conversely, large singular values indicate “degrees of freedom” in the image. Relevant result: if I take two different trained generative nets, and find latents for each such that they both output approximately the same image, then they both roughly agree on what directions-in-image-space are local degrees of freedom.
By taking the SVD of the jacobian of a chunk of the image with respect to the latent, we can figure out which directions-in-latent-space that chunk of image is locally sensitive to. And then, a rough local version of the natural abstraction hypothesis would say that nonadjacent chunks of image should strongly depend on the same small number of directions-in-latent-space, and be “locally independent” (i.e. not highly sensitive to the same directions-in-latent-space) given those few. And that was basically correct.
To be clear, this was all “rough heuristic testing”, not really testing predictions carefully derived from the natural abstraction framework.