My take on what’s going on here is that at random initialization, the neural network doesn’t pass around information in an easily usable way. I’m just arguing that mutual information doesn’t really capture this and we need some other formalization
Yup, I think that’s probably basically correct for neural nets, at least viewing them in the simplest way. I do think there are clever ways of modeling nets which would probably make mutual information a viable modeling choice—in particular, treat the weights as unknown, so we’re talking about mutual information not conditional on the weights. But that approach isn’t obviously computationally tractable, so probably something else would be more useful.
Yup, I think that’s probably basically correct for neural nets, at least viewing them in the simplest way. I do think there are clever ways of modeling nets which would probably make mutual information a viable modeling choice—in particular, treat the weights as unknown, so we’re talking about mutual information not conditional on the weights. But that approach isn’t obviously computationally tractable, so probably something else would be more useful.