There is an idea to define information flow as transfer entropy. One of the problems is the following example by Crutchfield:
Example we have two stochastic processes X=X0,X1,...,Xn−1,Xn and Y=Y0,Y1,...,Yn−1, Yn where Y0= is a fair coin flip, and all the Xi are independent fair coin flips.Yn is defined as Xn−1+Yn−1(Mod2), or said differently Yn=Xn−1XORYn−1. If you write it out it seems like the transfer entropy is saying that Xi is sending 1 bit of information to Yi+1 each time-step but this is stymied by the fact that we can also understand this information as coming from Y0.
Initially I thought this was a simple case of applying Pearl’s causality—but that turned out to be too naive. In a way I don’t quite understand completely this is stymied by higher-order dependencies—see this paper by James & Crutchfield. They give a examples of joint probability distributions over variables X,Y,Z where the higher-order dependencies mean that they cannot be understand to have a Pearlian DAG structure.
Moreover, they point out that very different joint distributions can look identical from the lens of Shannon information theory: conditional mutual information does not distinguish these distributions. The problem seems to be that Shannon information theory does not deal well with XORed variables and higher-order dependencies.
A natural thing to attack/understand higher-order dependencies would be to look at ‘interaction information’ which is supposed to measure 3-way dependencies between three variables. It can be famously negative—a fact that often surprises people. On the other hand, because it is defined in terms of conditional mutual information it isn’t actually able to distinguish the Crutchfield-James distributions.
We need new ideas. An obvious idea would be to involve topological & homotopical & knot theoretic notions—this theories are the canonical ways of dealing with higher-order dependencies. In fact, I would say this is so well-understood that the ‘problem’ of higher-order dependencies has been ‘solved’ by modern homotopy theory [which includes cohomology theory]. On the other hand, it is not immediately clear how to ‘integrate’ ideas from homotopy theory directly into Shannon information theory, but see below for one such attempt.
EDIT: in a conversation Scott suggested to me that the problem is that they chose an implict factorization of the variables- they should be looking at all possible variables. If we also look at information measures on variables like X XOR Y we should be able to tell the distributions apart.
Cyclic Causality. Pearl’s DAG Causality framework is rightly hailed as a great revolution. Indeed, it consequences have only begun to be absorbed. There is a similar, in fact more general theory already known in econometrics research known as ‘structural equation modelling’.
SEM is more general in that it allows for cyclic causality. Although this might invoke in the reader back-to-the-future grandfather paradoxes, closed time-like curves and other such exotic beasts cyclic causality is actually a much more down-to-earth notion: think about a pair of balls connected by a string. The positions of the balls are correlated, and in fact this correlation is causal: move one ball and the other will move with it and vice versa. Cyclic causality!
Of course balls connected by strings are absolutely fundamental in physics: when the string is elastic this becomes a coupled Harmonic oscillator!
a paper that goes into the details of latent and cyclic causality can be found here
Information Cohomology. There is a obscure theory called Information cohomology based on the observation that Shannon entropy can be seen as a 1-cocycle for a certain boundary operator. Using the abstract machinery of topos & cohomology theory it is then possible to define higher order cohomology groups and therefor higher-order Shannon entropy. I will not define information cohomology here but let me observe: -entropy H(*) is a cocycle, but not a coboundary. H gives rise to a cohomology class. It’s actually nontrivial. - Conditional mutual information is neither a cocycle or a coboundary. This might be either a bug or a feature of the theory, I’m not sure yet. - Interaction information I(*,*,*) is a coboundary and thus *also* a cocycle. In other words, it is trivial as a cohomology class. This is might be a feature of the theory! As observed above, because Interaction Information is defined by conditional mutual information it cannot actually pick-up on subtle higher-order dependencies and distinguish the James-Crutchfield distributions… but a nontrivial 3-cocycle in the H3 of Information Cohomology might! This would be really awesome. I will need to speak to an expert to see if this works out.
Remark. I have been informed that both in the Cyclic Causality and the Information Cohomology framework it seems natural to have an ‘assymetric independence’ relation. This is actually very natural from the point of view of Imprecise Probability & InfraBayesianism. In this theory the notion of independence ‘concept splinters’ into at least three different notion, two of which are assymetric—and better thought of as ‘irrelevance’ rather than ‘independence’ but I’m splitting hairs here.
What is the correct notion of Information Flow?
This shortform was inspired by a very intriguing talk about Shannon information theory in understanding complex systems by James Crutchfield.
There is an idea to define information flow as transfer entropy. One of the problems is the following example by Crutchfield:
Example we have two stochastic processes X=X0,X1,...,Xn−1,Xn and Y=Y0,Y1,...,Yn−1, Yn where Y0= is a fair coin flip, and all the Xi are independent fair coin flips.Yn is defined as Xn−1+Yn−1(Mod2), or said differently Yn=Xn−1XORYn−1. If you write it out it seems like the transfer entropy is saying that Xi is sending 1 bit of information to Yi+1 each time-step but this is stymied by the fact that we can also understand this information as coming from Y0.
Initially I thought this was a simple case of applying Pearl’s causality—but that turned out to be too naive. In a way I don’t quite understand completely this is stymied by higher-order dependencies—see this paper by James & Crutchfield. They give a examples of joint probability distributions over variables X,Y,Z where the higher-order dependencies mean that they cannot be understand to have a Pearlian DAG structure.
Moreover, they point out that very different joint distributions can look identical from the lens of Shannon information theory: conditional mutual information does not distinguish these distributions. The problem seems to be that Shannon information theory does not deal well with XORed variables and higher-order dependencies.
A natural thing to attack/understand higher-order dependencies would be to look at ‘interaction information’ which is supposed to measure 3-way dependencies between three variables. It can be famously negative—a fact that often surprises people. On the other hand, because it is defined in terms of conditional mutual information it isn’t actually able to distinguish the Crutchfield-James distributions.
We need new ideas. An obvious idea would be to involve topological & homotopical & knot theoretic notions—this theories are the canonical ways of dealing with higher-order dependencies. In fact, I would say this is so well-understood that the ‘problem’ of higher-order dependencies has been ‘solved’ by modern homotopy theory [which includes cohomology theory]. On the other hand, it is not immediately clear how to ‘integrate’ ideas from homotopy theory directly into Shannon information theory, but see below for one such attempt.
I see three possible ways to attack this problem:
Factored Sets
Cyclic Causality
Information Cohomology
Factored Sets. The advantage of factored sets is that they are able to deal with situations where some variables are XORs of other variables and pick out the ‘more primitive’ variables. In a sense it solves the famous philosophical blue-grue problem. A testament to the singular genius of our very own Garrabrant! Could this allow us to define the ‘right’ variant of Shannon information theory?
EDIT: in a conversation Scott suggested to me that the problem is that they chose an implict factorization of the variables- they should be looking at all possible variables. If we also look at information measures on variables like X XOR Y we should be able to tell the distributions apart.
Cyclic Causality. Pearl’s DAG Causality framework is rightly hailed as a great revolution. Indeed, it consequences have only begun to be absorbed. There is a similar, in fact more general theory already known in econometrics research known as ‘structural equation modelling’.
SEM is more general in that it allows for cyclic causality. Although this might invoke in the reader back-to-the-future grandfather paradoxes, closed time-like curves and other such exotic beasts cyclic causality is actually a much more down-to-earth notion: think about a pair of balls connected by a string. The positions of the balls are correlated, and in fact this correlation is causal: move one ball and the other will move with it and vice versa. Cyclic causality!
Of course balls connected by strings are absolutely fundamental in physics: when the string is elastic this becomes a coupled Harmonic oscillator!
a paper that goes into the details of latent and cyclic causality can be found here
Information Cohomology. There is a obscure theory called Information cohomology based on the observation that Shannon entropy can be seen as a 1-cocycle for a certain boundary operator. Using the abstract machinery of topos & cohomology theory it is then possible to define higher order cohomology groups and therefor higher-order Shannon entropy. I will not define information cohomology here but let me observe:
-entropy H(*) is a cocycle, but not a coboundary. H gives rise to a cohomology class. It’s actually nontrivial.
- Conditional mutual information is neither a cocycle or a coboundary. This might be either a bug or a feature of the theory, I’m not sure yet.
- Interaction information I(*,*,*) is a coboundary and thus *also* a cocycle. In other words, it is trivial as a cohomology class. This is might be a feature of the theory! As observed above, because Interaction Information is defined by conditional mutual information it cannot actually pick-up on subtle higher-order dependencies and distinguish the James-Crutchfield distributions… but a nontrivial 3-cocycle in the H3 of Information Cohomology might! This would be really awesome. I will need to speak to an expert to see if this works out.
[Unimportant Rk. That Shannon entropy might have something to do with differential operators is not completely insane perhaps owing to an observations due to John Baez: the Fadeev characterisiation of Shannon entropy is a the derivatives of glomming partition functions . It is unclear to me whether this is related to the Information Cohomology perspective however]
Remark. I have been informed that both in the Cyclic Causality and the Information Cohomology framework it seems natural to have an ‘assymetric independence’ relation. This is actually very natural from the point of view of Imprecise Probability & InfraBayesianism. In this theory the notion of independence ‘concept splinters’ into at least three different notion, two of which are assymetric—and better thought of as ‘irrelevance’ rather than ‘independence’ but I’m splitting hairs here.