I don’t think an “actual distribution” over the activations is a thing? The distribution depends on what inputs you feed it. I don’t see in what sense there’s some underlying “true” continuous distribution we could find here.
The input distribution we measure on should be one that is representative of the behaviour of the network we want to get the information flows for. To me, the most sensible candidate for this seemed to be the training distribution, since that’s the environment the network is constructed to operate in. I am receptive to arguments that out of distribution behaviour should be probed too somehow, but I’m not sure how to go about that in a principled manner.
So for now, the idea is to have P_B literally be the discrete distribution you get from directly sampling from the activations of B recorded from ca. one epoch of training data.
EDIT: The estimator we planned to use was just counting the frequencies. If we were trying to get the mutual information of some underlying distribution that the training samples are statistically drawn from, that would be a problem. The estimator would be biased. But since those are literally the probabilities the actual training process uses, I think this should be correct. We’re trying to get the mutual information of draws from the training data set, not the mutual information of some statistical process that produced the training data set.
I don’t think an “actual distribution” over the activations is a thing? The distribution depends on what inputs you feed it.
This seems to be what Thomas is saying as well, no?
[...] look at the network activations at each layer for a bunch of different inputs. This gives you a bunch of activations sampled from the distribution of activations. From there, you can do density estimation to estimate the actual distribution over the activations.
The same way you can talk about the actual training distribution underlying the samples in the training set it should be possible to talk about the actual distribution of the activations corresponding to a particular input distribution.
I believe Thomas is asking how you plan to do the first step of: Samples → Estimate underlying distribution → Get modularity score of estimated distribution
While from what you are describing I’m reading: Samples → Get estimate of modularity score
I guess another point here is that we won’t know how different (for example) our results when sampling from the training distribution will be from our results if we just run the network on random noise and then intervene on neurons; this would be an interesting thing to experimentally test. If they’re very similar, this neatly sidesteps the problem of deciding which one is more “natural”, and if they’re very different then that’s also interesting
I don’t think an “actual distribution” over the activations is a thing? The distribution depends on what inputs you feed it. I don’t see in what sense there’s some underlying “true” continuous distribution we could find here.
The input distribution we measure on should be one that is representative of the behaviour of the network we want to get the information flows for. To me, the most sensible candidate for this seemed to be the training distribution, since that’s the environment the network is constructed to operate in. I am receptive to arguments that out of distribution behaviour should be probed too somehow, but I’m not sure how to go about that in a principled manner.
So for now, the idea is to have P_B literally be the discrete distribution you get from directly sampling from the activations of B recorded from ca. one epoch of training data.
EDIT: The estimator we planned to use was just counting the frequencies. If we were trying to get the mutual information of some underlying distribution that the training samples are statistically drawn from, that would be a problem. The estimator would be biased. But since those are literally the probabilities the actual training process uses, I think this should be correct. We’re trying to get the mutual information of draws from the training data set, not the mutual information of some statistical process that produced the training data set.
This seems to be what Thomas is saying as well, no?
The same way you can talk about the actual training distribution underlying the samples in the training set it should be possible to talk about the actual distribution of the activations corresponding to a particular input distribution.
I believe Thomas is asking how you plan to do the first step of: Samples → Estimate underlying distribution → Get modularity score of estimated distribution
While from what you are describing I’m reading: Samples → Get estimate of modularity score
Thanks for clarifying for me, see the edit in the parent comment.
Thanks, this is indeed what I was asking.
I guess another point here is that we won’t know how different (for example) our results when sampling from the training distribution will be from our results if we just run the network on random noise and then intervene on neurons; this would be an interesting thing to experimentally test. If they’re very similar, this neatly sidesteps the problem of deciding which one is more “natural”, and if they’re very different then that’s also interesting