Meta: I’m going through a backlog of comments I never got around to answering. Sorry it took three months.
I’ve assumed it would be possible to reweight things to focus on a better distribution of data points, because it seems like there would be some very mathematically natural ways of doing this reweighting. Is this something you’ve experimented with?
Something along those lines might work; I didn’t spend much time on it before moving to a generative model.
When you say “directly applied”, what do you mean?
The actual main thing I did was to compute the SVD of the jacobian of a generative network output (i.e. the image) with respect to input (i.e. the latent vector). Results of interest:
Conceptually, near-0 singular values indicate a direction-in-image-space in which no latent parameter change will move the image—i.e. locally-inaccessible directions. Conversely, large singular values indicate “degrees of freedom” in the image. Relevant result: if I take two different trained generative nets, and find latents for each such that they both output approximately the same image, then they both roughly agree on what directions-in-image-space are local degrees of freedom.
By taking the SVD of the jacobian of a chunk of the image with respect to the latent, we can figure out which directions-in-latent-space that chunk of image is locally sensitive to. And then, a rough local version of the natural abstraction hypothesis would say that nonadjacent chunks of image should strongly depend on the same small number of directions-in-latent-space, and be “locally independent” (i.e. not highly sensitive to the same directions-in-latent-space) given those few. And that was basically correct.
To be clear, this was all “rough heuristic testing”, not really testing predictions carefully derived from the natural abstraction framework.
Meta: I’m going through a backlog of comments I never got around to answering. Sorry it took three months.
Something along those lines might work; I didn’t spend much time on it before moving to a generative model.
The actual main thing I did was to compute the SVD of the jacobian of a generative network output (i.e. the image) with respect to input (i.e. the latent vector). Results of interest:
Conceptually, near-0 singular values indicate a direction-in-image-space in which no latent parameter change will move the image—i.e. locally-inaccessible directions. Conversely, large singular values indicate “degrees of freedom” in the image. Relevant result: if I take two different trained generative nets, and find latents for each such that they both output approximately the same image, then they both roughly agree on what directions-in-image-space are local degrees of freedom.
By taking the SVD of the jacobian of a chunk of the image with respect to the latent, we can figure out which directions-in-latent-space that chunk of image is locally sensitive to. And then, a rough local version of the natural abstraction hypothesis would say that nonadjacent chunks of image should strongly depend on the same small number of directions-in-latent-space, and be “locally independent” (i.e. not highly sensitive to the same directions-in-latent-space) given those few. And that was basically correct.
To be clear, this was all “rough heuristic testing”, not really testing predictions carefully derived from the natural abstraction framework.