I want to look at deep neural-net learning and hierarchical inference through some kind of information-theoretic lens and try to show why hierarchical learning is such a powerful general principle. Anyone have an idea whether mutual information or KL-divergence is the normal measure used for this kind of study, or where I might look for literature other than surveys of deep learning, or why I might use one rather than the other?
I want to look at deep neural-net learning and hierarchical inference through some kind of information-theoretic lens and try to show why hierarchical learning is such a powerful general principle. Anyone have an idea whether mutual information or KL-divergence is the normal measure used for this kind of study, or where I might look for literature other than surveys of deep learning, or why I might use one rather than the other?