Can you explain how the generalized KPD fits into all of this? KPD is about estimating the parameters of a model from samples via a low dimensional statistic, whereas you are talking about estimating one part of a sample from another (distant) part of the sample via a low dimensional statistic. Are you using KPD to rule out “high-dimensional” correlations going through the parameters of the model?
Roughly speaking, the generalized KPD says that if the long-range correlations are low dimensional, then the whole distribution is exponential family (modulo a few “exceptional” variables). The theorem doesn’t rule out the possibility of high-dimensional correlations, but it narrows down the possible forms a lot if we can rule out high-dimensional correlations some other way. That’s what I’m hoping for: some simple/common conditions which limit the dimension of the long-range correlations, so that gKPD can apply.
This post says that those long range correlations have to be mediated by deterministic constraints, so if the dimension of the deterministic constraints is low, then that’s one potential route. Another potential route is some kind of information network flow approach—i.e. if lots of information is conserved along one “direction”, then that should limit information flow along “orthogonal directions”, which would mean that long-range correlations are limited between “most” local chunks of the graph.
I’m still confused. What direction of GKPD do you want to use? It sounds like you want to use the low-dimensional statistic ⇒ exponential family direction. Why? What is good about some family being exponential?
Yup, that’s the direction I want. If the distributions are exponential family, then that dramatically narrows down the space of distributions which need to be represented in order to represent abstractions in general. That means much simpler data structures—e.g. feature functions and Lagrange multipliers, rather than whole distributions.
So, your thesis is, only exponential models give rise to nice abstractions? And, since it’s important to have abstractions, we might just as well have our agents reason exclusively in terms of exponential models?
More like: exponential family distributions are a universal property of information-at-a-distance in large complex systems. So, we can use exponential models without any loss of generality when working with information-at-a-distance in large complex systems.
Can you explain how the generalized KPD fits into all of this? KPD is about estimating the parameters of a model from samples via a low dimensional statistic, whereas you are talking about estimating one part of a sample from another (distant) part of the sample via a low dimensional statistic. Are you using KPD to rule out “high-dimensional” correlations going through the parameters of the model?
Roughly speaking, the generalized KPD says that if the long-range correlations are low dimensional, then the whole distribution is exponential family (modulo a few “exceptional” variables). The theorem doesn’t rule out the possibility of high-dimensional correlations, but it narrows down the possible forms a lot if we can rule out high-dimensional correlations some other way. That’s what I’m hoping for: some simple/common conditions which limit the dimension of the long-range correlations, so that gKPD can apply.
This post says that those long range correlations have to be mediated by deterministic constraints, so if the dimension of the deterministic constraints is low, then that’s one potential route. Another potential route is some kind of information network flow approach—i.e. if lots of information is conserved along one “direction”, then that should limit information flow along “orthogonal directions”, which would mean that long-range correlations are limited between “most” local chunks of the graph.
I’m still confused. What direction of GKPD do you want to use? It sounds like you want to use the low-dimensional statistic ⇒ exponential family direction. Why? What is good about some family being exponential?
Yup, that’s the direction I want. If the distributions are exponential family, then that dramatically narrows down the space of distributions which need to be represented in order to represent abstractions in general. That means much simpler data structures—e.g. feature functions and Lagrange multipliers, rather than whole distributions.
So, your thesis is, only exponential models give rise to nice abstractions? And, since it’s important to have abstractions, we might just as well have our agents reason exclusively in terms of exponential models?
More like: exponential family distributions are a universal property of information-at-a-distance in large complex systems. So, we can use exponential models without any loss of generality when working with information-at-a-distance in large complex systems.
That’s what I hope to show, anyway.