beren comments on Deep learning models might be secretly (almost) linear

beren 25 Apr 2023 13:09 UTC
6 points
4
I think this is a good intuition. I think this comes down to the natural structure of the graph and the fact that information disappears at larger distances. This means that for dense graphs such as lattices etc regions only implicitly interact through much lower dimensional max-ent variables which are then additive while for other causal graph structures such as the power-law small-world graphs that are probably sensible for many real-world datasets, you also get a similar thing where each cluster can be modelled mostly independently apart from a few long-range interactions which can be modelled as interacting with some general ‘cluster sum’. Interestingly, this is how many approximate bayesian inference algorithms for factor graphs look like—such as the region graph algorithm. ( http://pachecoj.com/courses/csc665-1/papers/Yedidia_GBP_InfoTheory05.pdf).
I definitely agree it would be really nice to have the math of this all properly worked out as I think this, as well as the region why we see power-law spectra of features so often in natural datasets (which must have a max-ent explanation) is a super common and deep feature of the world.