On the basis of these remarks I submit the following qualified statement: while the belief network paradigm is mathematically elegant and intuitively appealing, it is NOT very useful for describing real data.
It’s sometimes possible to automatically reconstruct causal models from data. For example see Cosma Shalizi’s thesis and CSSR software, they completely changed my view of the subject.
Thanks cousin_it! Read that thesis, everyone else! I just did, and it’s amazing. Among other things, it contains a nice reduction of “emergence”, one that isn’t magical. Basically a process is emergent just when the fraction of historical memory stored in it which does “useful work” in the form of telling us about the future is greater than this fraction is in the process it derives from (pg 115-116).
More precisely, the fraction is the ratio of the process’ excess entropy (mutual information between its semi-infinite past and its semi-infinite future) and its statistical complexity (entropy of the causal states (informally: the class of sets of “inputs” deriving from identifying inputs leading to the same probability distribution over outputs) of the process).
Thermodynamic macrostate processes are emergent because they more efficiently predict the future than their underlying microstate processes.
The thesis also gives a non-trivial necessary condition for describing a process as “self-organizing”, which is that its statistical complexity increases over time—the causal architecture of the process does not change, but the amount of information needed to place the process in a state within the architecture increases. For example, a system that will go from uniform behavior to periodic behavior over time is self-organizing.
Anyway, I took most of that straight out of Chapter 11 of Cosma Shalizi’s thesis, and that’s the concluding summary chapter, so if you’re suspicious something I just said isn’t very rigorous, check out the paper. You may or may not be disappointed, as from Shalizi’s introduction:
A word about the math. I aim at a moderate degree of rigor throughout | but as two wise men have remarked, “One man’s rigor is another man’s mortis” (Bohren and Albrecht 1998). My ideal has been to keep to about the level of rigor of Shannon (1948)3. In some places (like Appendix B.3), I’m closer to the onset of mortis. No result on which anything else depends should have an invalid proof. There are places, naturally, where I am not even trying to be rigorous, but merely plausible, or even “physical,” but it should be clear from context where those are.
Do you mean this? Pearl and Velma, “A theory of inferred causation” As far as I can see the definitions are very similar, but Pearl’s “algorithm” requires complete knowledge of the complete distribution, while Shalizi’s reconstruction approach works from samples.
It’s sometimes possible to automatically reconstruct causal models from data. For example see Cosma Shalizi’s thesis and CSSR software, they completely changed my view of the subject.
Thanks cousin_it! Read that thesis, everyone else! I just did, and it’s amazing. Among other things, it contains a nice reduction of “emergence”, one that isn’t magical. Basically a process is emergent just when the fraction of historical memory stored in it which does “useful work” in the form of telling us about the future is greater than this fraction is in the process it derives from (pg 115-116).
More precisely, the fraction is the ratio of the process’ excess entropy (mutual information between its semi-infinite past and its semi-infinite future) and its statistical complexity (entropy of the causal states (informally: the class of sets of “inputs” deriving from identifying inputs leading to the same probability distribution over outputs) of the process).
Thermodynamic macrostate processes are emergent because they more efficiently predict the future than their underlying microstate processes.
The thesis also gives a non-trivial necessary condition for describing a process as “self-organizing”, which is that its statistical complexity increases over time—the causal architecture of the process does not change, but the amount of information needed to place the process in a state within the architecture increases. For example, a system that will go from uniform behavior to periodic behavior over time is self-organizing.
Anyway, I took most of that straight out of Chapter 11 of Cosma Shalizi’s thesis, and that’s the concluding summary chapter, so if you’re suspicious something I just said isn’t very rigorous, check out the paper. You may or may not be disappointed, as from Shalizi’s introduction:
Thanks for the thesis link. That looks to be an interesting read and should be quite informative about pattern recognition, entropy, and complexity.
Some of you may remember a previous discussion of Shalizi here in which he was criticized for his position on thermodynamics. (Scroll to the bottom.)
By the way, I thought Pearl gave algorithms for identifying causal structure in Causality or Probabilistic Reasoning. Are they not effective enough?
Probabilistic Reasoning Intelligent Systems
Chpt 8, Learning structure from data
8.2 presents algorithms for learning tree-shaped networks only.
8.3 discusses how to learn tree-shaped networks using hidden independent variables
Do you mean this? Pearl and Velma, “A theory of inferred causation” As far as I can see the definitions are very similar, but Pearl’s “algorithm” requires complete knowledge of the complete distribution, while Shalizi’s reconstruction approach works from samples.