The Metaphysical Structure of Pearl’s Theory of Time
Epistemic status: metaphysics
I was reading Factored Space Models (previously, Finite Factored Sets) and was trying to understand in what sense it was a Theory of Time.
Scott Garrabrant says “[The Pearlian Theory of Time] … is the best thing to happen to our understanding of time since Einstein”. I read Pearl’s book on Causality[1], and while there’s math, this metaphysical connection that Scott seems to make isn’t really explicated. Timeless Causality and Timeless Physics is the only place I saw this view explained explicitly, but not at the level of math / language used in Pearl’s book.
Here is my attempt at explicitly writing down what all of these views are pointing at (in a more rigorous language)—the core of the Pearlian Theory of Time, and in what sense FSM shares the same structure.
Causality leave a shadow of conditional independence relationships over the observational distribution. Here’s an explanation providing the core intuition:
Suppose you represent the ground truth structure of [causality / determination] of the world via a Structural Causal Model over some variables, a very reasonable choice. Then, as you go down the Pearlian Rung: SCM →[2] Causal Bayes Net →[3] Bayes Net, theorems guarantee that the Bayes Net is still Markovian wrt the observational distribution.
Causal Discovery then (at least in this example) reduces to inferring the equation assignment directions of the SCM, given only the observational distribution.
The earlier result guarantees that all you have to do is find a Bayes Net that is Markovian wrt the observational distribution. Alongside the faithfulness assumption, this thus reduces to finding a Bayes Net structure G whose set of independencies (implied by d-separation) are identical to that of P (or, finding the Perfect Map of a distribution[4]).
Then, at least some of the edges of the Perfect Map will have its directions nailed down by the conditional independence relations.
The metaphysical claim is that, this direction is the definition of time[5], morally so, based on the intuition provided by the example above.
So, the Pearlian Theory of Time is the claim that Time is the partial order over the variables of a Bayes Net corresponding to the perfect map of a distribution.
Abstracting away, the structure of any Theory of Time is then to:
find a mathematical structure [in the Pearlian Theory of Time, a Bayes Net]
… that has gadgets [d-separation]
… that are, in some sense, “equivalent” [soundness & completeness] to the conditional independence relations of the distribution the structure is modeling
… while containing a notion of order [parenthood relationship of nodes in a Bayes Net]
… while this order induced from the gadget coinciding to that of d-separation [trivially so here, because we’re talking about Bayes Nets and d-separation] such that it captures the earlier example which provided the core intuition behind our Theory of Time.
This is exactly what Factored Space Model does:
find a mathematical structure [Factored Space Model]
… that has gadgets [structural independence]
… that are, in some sense, “equivalent” [soundness & completeness] to the conditional independence relations of the distribution the structure is modeling
… while containing a notion of order [preorder relation induced by the subset relationship of the History]
… while this order induced from the gadget coinciding to that of d-separation [by a theorem of FSM] such that it captures the earlier example which provided the core intuition behind our Theory of Time.
while, additionally, generalizing the scope of our Theory of Time from [variables that appear in the Bayes Net] to [any variables defined over the factored space].
… thus justifying calling FSM a Theory of Time in the same spirit that Pearlian Causal Discovery is a Theory of Time.
By (1) making a graph with edge direction corresponding to equation assignment direction, (2) pushforwarding uncertainties to endogenous variables, and (3) letting interventional distributions be defined by the truncated factorization formula.
By (1) forgetting the causal semantics, i.e. no longer associating the graph with all the interventional distributions, and only the no intervention observational distribution.
This approach goes back to Hans Reichenbach’s book The Direction of Time. I think the problem is that the set of independencies alone is not sufficient to determine a causal and temporal order. For example, the same independencies between three variables could be interpreted as the chains A→B→C and A←B←C. I think Pearl talks about this issue in the last chapter.
The critical insight is that this is not always the case!
Let’s call two graphs I-equivalent if their set of independencies (implied by d-separation) are identical. A theorem of Bayes Nets say that two graphs are I-equivalent if they have the same skeleton and the same set of immoralities.
This last constraint, plus the constraint that the graph must be acyclic, allows some arrow directions to be identified—namely, across all I-equivalent graphs that are the perfect map of a distribution, some of the edges have identical directions assigned to them.
The IC algorithm (Verma & Pearl, 1990) for finding perfect maps (hence temporal direction) is exactly about exploiting these conditions to orient as many of the edges as possible:
More intuitively, (Verma & Pearl, 1992) and (Meek, 1995) together shows that the following four rules are necessary and sufficient operations to maximally orient the graph according to the I-equivalence (+ acyclicity) constraint:
Anyone interested in further detail should consult Pearl’s Causality Ch 2. Note that for some reason Ch 2 is the only chapter in the book where Pearl talks about Causal Discovery (i.e. inferring time from observational distribution) and the rest of the book is all about Causal Inference (i.e. inferring causal effect from (partially) known causal structure).
Ah yes, the fork asymmetry. I think Pearl believes that correlations reduce to causations, so this is probably why he wouldn’t particularly try to, conversely, reduce causal structure to a set of (in)dependencies. I’m not sure whether the latter reduction is ultimately possible in the universe. Are the correlations present in the universe, e.g. defined via the Albert/Loewer Mentaculus probability distribution, sufficient to recover the familiar causal structure of the universe?
The Metaphysical Structure of Pearl’s Theory of Time
Epistemic status: metaphysics
I was reading Factored Space Models (previously, Finite Factored Sets) and was trying to understand in what sense it was a Theory of Time.
Scott Garrabrant says “[The Pearlian Theory of Time] … is the best thing to happen to our understanding of time since Einstein”. I read Pearl’s book on Causality[1], and while there’s math, this metaphysical connection that Scott seems to make isn’t really explicated. Timeless Causality and Timeless Physics is the only place I saw this view explained explicitly, but not at the level of math / language used in Pearl’s book.
Here is my attempt at explicitly writing down what all of these views are pointing at (in a more rigorous language)—the core of the Pearlian Theory of Time, and in what sense FSM shares the same structure.
Causality leave a shadow of conditional independence relationships over the observational distribution. Here’s an explanation providing the core intuition:
Suppose you represent the ground truth structure of [causality / determination] of the world via a Structural Causal Model over some variables, a very reasonable choice. Then, as you go down the Pearlian Rung: SCM →[2] Causal Bayes Net →[3] Bayes Net, theorems guarantee that the Bayes Net is still Markovian wrt the observational distribution.
(Read Timeless Causality for an intuitive example.)
Causal Discovery then (at least in this example) reduces to inferring the equation assignment directions of the SCM, given only the observational distribution.
The earlier result guarantees that all you have to do is find a Bayes Net that is Markovian wrt the observational distribution. Alongside the faithfulness assumption, this thus reduces to finding a Bayes Net structure G whose set of independencies (implied by d-separation) are identical to that of P (or, finding the Perfect Map of a distribution[4]).
Then, at least some of the edges of the Perfect Map will have its directions nailed down by the conditional independence relations.
The metaphysical claim is that, this direction is the definition of time[5], morally so, based on the intuition provided by the example above.
So, the Pearlian Theory of Time is the claim that Time is the partial order over the variables of a Bayes Net corresponding to the perfect map of a distribution.
Abstracting away, the structure of any Theory of Time is then to:
find a mathematical structure [in the Pearlian Theory of Time, a Bayes Net]
… that has gadgets [d-separation]
… that are, in some sense, “equivalent” [soundness & completeness] to the conditional independence relations of the distribution the structure is modeling
… while containing a notion of order [parenthood relationship of nodes in a Bayes Net]
… while this order induced from the gadget coinciding to that of d-separation [trivially so here, because we’re talking about Bayes Nets and d-separation] such that it captures the earlier example which provided the core intuition behind our Theory of Time.
This is exactly what Factored Space Model does:
find a mathematical structure [Factored Space Model]
… that has gadgets [structural independence]
… that are, in some sense, “equivalent” [soundness & completeness] to the conditional independence relations of the distribution the structure is modeling
… while containing a notion of order [preorder relation induced by the subset relationship of the History]
… while this order induced from the gadget coinciding to that of d-separation [by a theorem of FSM] such that it captures the earlier example which provided the core intuition behind our Theory of Time.
while, additionally, generalizing the scope of our Theory of Time from [variables that appear in the Bayes Net] to [any variables defined over the factored space].
… thus justifying calling FSM a Theory of Time in the same spirit that Pearlian Causal Discovery is a Theory of Time.
Chapter 2, specifically, which is about Causal Discovery. All the other chapters are mostly irrelevant for this purpose.
By (1) making a graph with edge direction corresponding to equation assignment direction, (2) pushforwarding uncertainties to endogenous variables, and (3) letting interventional distributions be defined by the truncated factorization formula.
By (1) forgetting the causal semantics, i.e. no longer associating the graph with all the interventional distributions, and only the no intervention observational distribution.
This shortform answers this question I had.
Pearl comes very close. In his Temporal Bias Conjecture (2.8.2):
(where statistical time refers to the aforementioned direction.)
But doesn’t go as far as this ought to be the definition of Time.
This approach goes back to Hans Reichenbach’s book The Direction of Time. I think the problem is that the set of independencies alone is not sufficient to determine a causal and temporal order. For example, the same independencies between three variables could be interpreted as the chains A→B→C and A←B←C. I think Pearl talks about this issue in the last chapter.
The critical insight is that this is not always the case!
Let’s call two graphs I-equivalent if their set of independencies (implied by d-separation) are identical. A theorem of Bayes Nets say that two graphs are I-equivalent if they have the same skeleton and the same set of immoralities.
This last constraint, plus the constraint that the graph must be acyclic, allows some arrow directions to be identified—namely, across all I-equivalent graphs that are the perfect map of a distribution, some of the edges have identical directions assigned to them.
The IC algorithm (Verma & Pearl, 1990) for finding perfect maps (hence temporal direction) is exactly about exploiting these conditions to orient as many of the edges as possible:
More intuitively, (Verma & Pearl, 1992) and (Meek, 1995) together shows that the following four rules are necessary and sufficient operations to maximally orient the graph according to the I-equivalence (+ acyclicity) constraint:
Anyone interested in further detail should consult Pearl’s Causality Ch 2. Note that for some reason Ch 2 is the only chapter in the book where Pearl talks about Causal Discovery (i.e. inferring time from observational distribution) and the rest of the book is all about Causal Inference (i.e. inferring causal effect from (partially) known causal structure).
Ah yes, the fork asymmetry. I think Pearl believes that correlations reduce to causations, so this is probably why he wouldn’t particularly try to, conversely, reduce causal structure to a set of (in)dependencies. I’m not sure whether the latter reduction is ultimately possible in the universe. Are the correlations present in the universe, e.g. defined via the Albert/Loewer Mentaculus probability distribution, sufficient to recover the familiar causal structure of the universe?