Inspired by this Shalizi paper defining local causal states. The idea is so simple and elegant I’m surprised I had never seen it before.
Basically, starting with a a factored probability distribution Xt=(X1(t),...,Xkt(t)) over a dynamical DAG Dt we can use Crutchfield causal state construction locally to construct a derived causal model factored over the dynamical DAG as X′t where X′t is defined by considering the past and forward lightcone of Xt defined as L−(Xt),L+(Xt) all those points/ variables Yt2 which influence Xt respectively are influenced by Xt (in a causal interventional sense) . Now take define the equivalence relatio on realization at∼bt of L−(Xt) (which includes Xt by definition)[1] whenever the conditional probability distribution p(L+(Xt)|at)=p(L+(Xt)|bt) on the future light cones are equal.
These factored probability distributions over dynamical DAGs are called ‘fields’ by physicists. Given any field F(x,t) we define a derived local causal state field ϵ(F(x,t)) in the above way. Woah!
Some thoughts and questions
this depends on the choice of causal factorizations. Sometimes these causal factorizations are given but in full generality one probably has to consider all factorizations simultaneously, each giving a different local state presentation!
What is the Factored sets angle here?
In particular, given a stochastic process ...→X−1→X0→X1→... the reverse XBackToTheFuturet:=X−t can give a wildly different local causal field as minimal predictors and retrodictors can be different. This can be exhibited by the random insertion process, see this paper.
Let a stochastic process Xt be given and define the (forward) causal states St as usual. The key ‘stochastic complexity’ quantity is defined as the mutual information I(St;X≤0) of the causal states and the past. We may generalize this definition, replacing the past with the local past lightcone to give a local stochastic complexity.
Under the assumption that the stochastic process is ergodic the causal state form an irreducible Hidden Markov Model and the stochastic complexity can be calculated as the entropy of the stationary distribution.
!!Importantly, the stochastic complexity is different from the ‘excess entropy’ of the mutual information of the past (lightcone) and the future (lightcone).
This gives potentially a lot of very meaningful quantities to compute. These are I think related to correlation functions but contain more information in general.
Note that the local causal state construction is always possible—it works in full generality. Really quite incredible!
How are local causal fields related to Wentworth’s latent natural abstractions?
Shalizi conjectures that the local causal states form a Markov field—which would mean by Hammersley-Clifford we could describe the system as a Gibb distribution ! This would prove an equivalence between the Gibbs/MaxEnt/ Pitman-Koopman-Darmois theory and the conditional independence story of Natural Abstraction roughly similar to early approaches of John.
I am not sure what the status of the conjecture is at this moment. It seems rather remarkable that such a basic fact, if true, cannot be proven. I haven’t thought about it much but perhaps it is false in a subtle way.
A Markov field factorizes over an undirected graph which seems strictly less general than a directed graph. I’m confused about this.
Given a symmetry group G acting on the original causal model /field F(x,t)=(p,D) the action will descend to an action G↷ϵ(F)(x,t) on the derived local causal state field.
A stationary process X(t) is exactly one with a translation action by Z. This underlies the original epsilon machine construction of Crutchfield, namely the fact that the causal states don’t just form a set (+probability distribution) but are endowed with a monoid structure → Hidden Markov Model.
Just finished the local causal states paper, it’s pretty cool! A couple of thoughts though:
I don’t think the causal states factorize over the dynamical bayes net, unlike the original random variables (by assumption). Shalizi doesn’t claim this either.
This would require proving that each causal state is conditionally independent of its nondescendant causal states given its parents, which is a stronger theorem than what is proved in Theorem 5 (only conditionally independent of its ancestor causal states, not necessarily all the nondescendants)
Also I don’t follow the Markov Field part—how would proving:
if we condition on present neighbors of the patch, as well as the parents of the patch, then we get independence of the states of all points at time t or earlier. (pg 16)
… show that the causal states is a markov field (aka satisfies markov independencies (local or pairwise or global) induced by an undirected graph)? I’m not even sure what undirected graph the causal states would be markov with respect to. Is it the …
… skeleton of the dynamical Bayes Net? that would require proving a different theorem: “if we condition on parents and children of the patch, then we get independence of all the other states” which would prove local markov independency
… skeleton of the dynamical Bayes Net + edges for the original graph for each t? that would also require proving a different theorem: “if we condition on present neighbors, parents, and children of the patch, then we get independence of all the other states” which would prove local markov independency
Also for concreteness I think I need to understand its application in detecting coherent structures in cellular automata to better appreciate this construction, though the automata theory part does go a bit over my head :p
Inspired by this Shalizi paper defining local causal states. The idea is so simple and elegant I’m surprised I had never seen it before.
Basically, starting with a a factored probability distribution Xt=(X1(t),...,Xkt(t)) over a dynamical DAG Dt we can use Crutchfield causal state construction locally to construct a derived causal model factored over the dynamical DAG as X′t where X′t is defined by considering the past and forward lightcone of Xt defined as L−(Xt),L+(Xt) all those points/ variables Yt2 which influence Xt respectively are influenced by Xt (in a causal interventional sense) . Now take define the equivalence relatio on realization at∼bt of L−(Xt) (which includes Xt by definition)[1] whenever the conditional probability distribution p(L+(Xt)|at)=p(L+(Xt)|bt) on the future light cones are equal.
These factored probability distributions over dynamical DAGs are called ‘fields’ by physicists. Given any field F(x,t) we define a derived local causal state field ϵ(F(x,t)) in the above way. Woah!
Some thoughts and questions
this depends on the choice of causal factorizations. Sometimes these causal factorizations are given but in full generality one probably has to consider all factorizations simultaneously, each giving a different local state presentation!
What is the Factored sets angle here?
In particular, given a stochastic process ...→X−1→X0→X1→... the reverse XBackToTheFuturet:=X−t can give a wildly different local causal field as minimal predictors and retrodictors can be different. This can be exhibited by the random insertion process, see this paper.
Let a stochastic process Xt be given and define the (forward) causal states St as usual. The key ‘stochastic complexity’ quantity is defined as the mutual information I(St;X≤0) of the causal states and the past. We may generalize this definition, replacing the past with the local past lightcone to give a local stochastic complexity.
Under the assumption that the stochastic process is ergodic the causal state form an irreducible Hidden Markov Model and the stochastic complexity can be calculated as the entropy of the stationary distribution.
!!Importantly, the stochastic complexity is different from the ‘excess entropy’ of the mutual information of the past (lightcone) and the future (lightcone).
This gives potentially a lot of very meaningful quantities to compute. These are I think related to correlation functions but contain more information in general.
Note that the local causal state construction is always possible—it works in full generality. Really quite incredible!
How are local causal fields related to Wentworth’s latent natural abstractions?
Shalizi conjectures that the local causal states form a Markov field—which would mean by Hammersley-Clifford we could describe the system as a Gibb distribution ! This would prove an equivalence between the Gibbs/MaxEnt/ Pitman-Koopman-Darmois theory and the conditional independence story of Natural Abstraction roughly similar to early approaches of John.
I am not sure what the status of the conjecture is at this moment. It seems rather remarkable that such a basic fact, if true, cannot be proven. I haven’t thought about it much but perhaps it is false in a subtle way.
A Markov field factorizes over an undirected graph which seems strictly less general than a directed graph. I’m confused about this.
Given a symmetry group G acting on the original causal model /field F(x,t)=(p,D) the action will descend to an action G↷ϵ(F)(x,t) on the derived local causal state field.
A stationary process X(t) is exactly one with a translation action by Z. This underlies the original epsilon machine construction of Crutchfield, namely the fact that the causal states don’t just form a set (+probability distribution) but are endowed with a monoid structure → Hidden Markov Model.
In other words, by convention the Past includes the Present X0 while the Future excludes the Present.
Just finished the local causal states paper, it’s pretty cool! A couple of thoughts though:
I don’t think the causal states factorize over the dynamical bayes net, unlike the original random variables (by assumption). Shalizi doesn’t claim this either.
This would require proving that each causal state is conditionally independent of its nondescendant causal states given its parents, which is a stronger theorem than what is proved in Theorem 5 (only conditionally independent of its ancestor causal states, not necessarily all the nondescendants)
Also I don’t follow the Markov Field part—how would proving:
… show that the causal states is a markov field (aka satisfies markov independencies (local or pairwise or global) induced by an undirected graph)? I’m not even sure what undirected graph the causal states would be markov with respect to. Is it the …
… skeleton of the dynamical Bayes Net? that would require proving a different theorem: “if we condition on parents and children of the patch, then we get independence of all the other states” which would prove local markov independency
… skeleton of the dynamical Bayes Net + edges for the original graph for each t? that would also require proving a different theorem: “if we condition on present neighbors, parents, and children of the patch, then we get independence of all the other states” which would prove local markov independency
Also for concreteness I think I need to understand its application in detecting coherent structures in cellular automata to better appreciate this construction, though the automata theory part does go a bit over my head :p