Toward “timeless” continuous-time causal models
I’m a bit at a loss as to where to put this. I know the inferential gap is too great for it to go anywhere but here, and I know that the number of people on LW interested in this subject could be counted on one hand. The prerequisites would almost certainly be Timeless Causality and more mathematics than anyone is really interested in learning.
So, I apologize in advance if you read this and discover at the end it was a waste of your time. But at the same time, I need people who know about these things to talk about them with me, to ensure that I haven’t gone crazy… yet. And most importantly, I need to know the people who have done this before, so that I don’t have to do it. Google can’t find them.
Introduction
There are currently some efforts to generalize the causal models of Pearl to continuous-time situations. Most of these attempts involve replacing some causal discrete variables Xi with time-dependent random variables Xi(t). Possibly due to memetic infection from Yudkowsky, I don’t think this is necessarily the correct approach. The philosophical power of Pearl’s theory comes from the fact that it is timeless, that ba’o vimcu ty bu.
In order to motivate my working definitions for where such a timeless continuous-time theory will go, I need to go back to classical causality and decide what a timeless formulation actually means, formally. Spoiler: it means replacing time-dependent evolution with a global flow on the phase space of the system. This is more or less in line with what is said in Timeless Physics with regard to the glimpse of “quantum mist” illustrated there.
The role of phase space
What is “timelessness”? The first thing I thought of after reading the timeless subsequence was, “What does a timeless formulation of the wave equation look like?” First of all, this was the right thought, because the wave equation is what I’ll call (after the fact) “classically causal” in a sense to be described soon. I wouldn’t have seen the timelessness in a different mathematical model, because not all mathematical models of reality preserve the underlying phenomena’s causal structure. On the other hand, this was the wrong thought, because the wave equation is not the simplest continuous-time system that would have led me to this formalization of timelessness. Unfortunately the one that is easier for me to see (Lagrangian mechanics) is harder for me to explain, so you’re stuck with a suboptimal explanation.
The wave equation models all sorts of wave-like phenomena: light, acoustic waves, earthquakes, and so on. If we take the speed of sound to be one (as physicists are wont to do), the dispersion relation is ω2 = k2. Such a dispersion relation satisfies the Kramers-Kronig relation. As it turns out, equations whose dispersion relation satisfies this condition satisfy what I’m calling “classical causality”, but what is more commonly known as finite speed of propagation — or, more physically speaking, the fact that signals stay within their light cone.
The most common problem associated with the wave equation is the Cauchy problem. At time zero, we specify the state of the system: its initial position and velocity at every point. Then the solution of the wave equation describes how that initial state evolves with time. From a more abstract point of view, this evolution is a curve in the space of all possible initial states. This space is commonly referred to in the specific case of the wave equation as “energy space”, which further illustrates why this example is a bit bad for pedagogical purposes. From now on, we’re only going to talk about phase space.
Here is where we can remove time from the equation. Instead of thinking of the wave equation as associating to every state in phase space a time-dependent curve issuing forth from it, we’re going to think of the wave equation as specifying a global flow on the whole of phase space, all at once. In summary, I am led to believe that timeless formulations amount to abstracting away the time-dependence of the system’s evolution as a flow on the phase space of the system. And to think, this insight only took three years to internalize, provided I’ve gotten it correct.
Causal flow
The situation for a causal model is harder. In part, because stochastic things have shoddy excuses for derivatives. For the moment, we’re going to take the easiest possible continuous-time system: our causal N variables of interest, Xi, take only real values. The space of all the possible states of the system is N-dimensional Euclidean space, which is easy enough to work with. I’m going to implicitly assume that causal variables evolve continuously; that is, the sidewalk doesn’t go from being completely dry to completely wet instantaneously. Things like light switches and push buttons can still be modeled practically by bump functions and the like, so I don’t see this as a real limitation.
The somewhat harder bullet to swallow is the assumption that the random variables are Markovian; that is, they are “memoryless” in the sense that only the present state determines the future. Pearl spends some time in Causality defending this assumption from criticism that it doesn’t apply to quantum systems — I believe this defense is reasonable. I believe that causal models are necessarily refinements of our beliefs about what is still for the most part a classical world, and so the Markov assumption is not necessarily unnatural.
The phase space of N-dimensional Euclidean space is known as the tangent bundle, which amounts to having an additional copy of N-space at every point. Morally speaking, the tangent bundle represents all the directions and speeds in which the system can evolve from any given state.
We need some data about how the system is supposed to evolve: what I will call the causal flow. As best as I can currently conjecture, this data should take the form of a “bundle” of probability measures P, one for each point in N-space, such that each probability measure P(x) is defined over the tangent copy of N-space attached to that point.
By analogy with the previous section, the time-evolution of the system is given by Lipschitz-continuous curves in N-space. (Lipschitz-continuous, because if we assume they are differentiable curves, the Markov assumption goes out the window.) In contrast with the discrete theory of causality, and as mentioned above, we don’t allow causal variables to “jump” spontaneously, and there is a limit to how sharply they can “turn”.
A useful thing to have around would be the probability that the system will evolve from one state to another via a specific choice of one of these curves. Lipschitz-continuous curves are rectifiable, and so one can recapitulate a sort of Riemann sum — if you’re interested, I have it formally written down in a .pdf, but the current format is unfriendly to maths. So for now, you’ll just have to take my word for it when I say I can define the probability of the flow following a specific path. From there, it’s just a path integral to defining the probability of getting from one state to another.
Where to go from here?
Given this causal flow, d-separation should arise as a geometrical condition — but perhaps only a local one, for the causal structure of the system can also evolve with time. To intervene in this system is to project it onto a certain hyperplane, presumably, in some yet-to-be-determined way. And finally, there ought to be some way to define counterfactuals, but my limited mathematical foresight has already run too thin.
BONUS: If you’ve made it this far and can’t think of anything else to say, I’m willing to Crocker-entertain probabilities that I’m insane and/or a crackpot.
Maybe the individual trajectories of such a system could be described as solutions to some stochastic differential equation?
Also by analogy with continuous-time Markov processes, it might be easier to forget about individual trajectories and instead think about the “flow” of probability density in phase space, which can probably be described by a partial differential equation without needing to define your own Riemann sums, path integrals and such. Or maybe I’m missing something here and you really need customized machinery?
Also Cosma Shalizi is an expert on both causal models and continuous-time stochastic processes, so maybe you could ask him or look at his work if you haven’t seen it already.
It might work. I haven’t thought about it yet.
I know of him and have read some of his stuff, but the work isn’t in a sufficiently stable state to bother an academic with it yet. I need more evidence that this is the fruitful path. I expect it would be difficult to convince him of the value of such an effort, since there’s no evidence yet that it’s even different from what is being done already.
May I suggest working out the graphical model version of continuous time Markov chains as an intermediate step
(e.g. something like this: http://boa.unimib.it/bitstream/10281/19575/1/phd_unimib_040750.pdf)
paper-machine_2013 no longer believes anything he wrote about this last year, and no longer has the resources to start again from scratch, due to impending ABD status.
Good luck!
Thanks~! I’m going to need it ;_;;;
This looks really interesting. It is still rather sketchy, though, and my physics is not good enough to be confident about how to fill in the details. Would you mind sending along your pdf?
One minor quibble and then a more general remark:
This is not quite correct as a description of the SGS-Pearl system. For SGS-Pearl, a causal system is Markovian relative to a graph, not necessarily relative to time. That is, the graphical parents of a variable screen that variable off from all of its non-descendants. Assume a discrete-time model. There is no requirement that the graphical parents of a given variable live in the immediately previous time-slice. We could have delayed, direct causation. For example, if we had X(t=T) --> X(t=T+2) and also Y(t=T+1) --> X(t=T+2), where no other variable is a direct cause of X(t=T+2). The variable X(t=T+2) is independent of its non-descendants given X(t=T) and Y(t=T+1).
If you assume that all genuine, fully unrolled causal connections have the same time-interval, then your version of the Markov condition lines up with SGS-Pearl.
On its face, stating the usual global causal Markov axiom—but assuming densely-ordered times, rather than non-densely-ordered times, is non-obvious. Since in such a system no variable (graph-vertex) will have any specifiable direct causes (parents), you can’t simply say that a variable is independent of its non-descendants conditional on its parents.
You also can’t just condition on a time-slice of ancestors (even assuming that such a slice screens off the past from the future), since there might be branching from a variable that lives between whatever slice you pick and the target variable. That is, suppose the times are densely ordered, and your target variable Z lives at time T2. Now, suppose that you condition on all the variables X at time T1 < T2. Since the ordering is dense, there are times between T1 and T2. For all we know, there might be a variable X(unlucky) that lives at one of the times between T1 and T2 such that X(unlucky) is a common cause of Z and some non-ancestral, non-descendant of Z, call it Y. In that case, we expect Y and Z to be associated in virtue of the common cause X(unlucky).
Thoughts?
I’ll find somewhere to stick it and link it to the post.
Yeah, sloppy writing on my part; the “time” that appears here is only an observer’s sequence of observations of the system’s state. I agree with what you say about discrete-time models.
What was assumed there is not that, because in this development I have not gotten to the point of finding a directed graph of variables anywhere. Presumably I’ll need a local corollary of Pearl’s theorem 1.4.1, i.e., every causal flow model (probably subject to some technical restrictions) locally induces a graph model with a compatible joint probability distribution. This has some hope of succeeding; if a distribution is consistent with a graph model, then small perturbations of it are also consistent with it.
This is what I meant to assume: that X is a continuous-time markov process.
Ah! Yes, that makes sense. I’m looking forward to reading the paper.
Sorry, my real life job intervened and killed most of this week. There’s a slight kink in the current draft that I need to rewrite (it’s not a game-breaker), but that’ll have to wait until I get some free time. I also need to find some free webspace somewhere; it seems that the Megaupload debacle killed all the free filesending platforms.
My attempts at finding semi-stable webspace failed. In the meanwhile, for the sake of Nisan’s razor, here is a temporary link to the .pdf. It hasn’t been fixed yet; the ending is not very rigorous. I probably got a bit too excited near the end.
What’s the goal here? To say “yes, it exists, causality does exist for continuous time?” To use it for stuff? Because if it’s the latter I think a lot of loss of generality is gonna have to happen, particularly about what all these functions at each point in phase space are.
LaTeX in Less Wrong
Also, the wiki page on using LaTeX in Less Wrong.
It would be ideal if there were a script somewhere that eats an austere LaTeX file and spits out an html file.
Wouldn’t the right solution be to use MathJax?
Hm, yes. I don’t think I can do that, though, because I can’t put javascript into posts.
If LW would update the page template to have the script in the html header, I think we’d be set. Isn’t there a site admin for this?
I think this is critical, because rationality in the end needs mathematical support, and MathJax is really the de facto way of putting math in web posts at this point.
Someone once requested that Less Wrong implement jsMath, and it seems like it was declined. I just submitted a request for MathJax. I guess we’ll see what happens.
http://www.texify.com/ ?
That seems to accomplish the same thing as John Maxwell’s utility.
I have a file that contains text interspersed with many formulas in LaTeX math mode, delimited by dollar signs or whatever. I’d like something that will replace those $-delimited formulas with html image tags. I’ll probably write one myself when I need it.
You would be my personal hero for a period of time not exceeding a week, with at most ten possible thirty-second exceptions during that period.
I tweaked John Maxwell’s utility and came up with this thing. It only works for LessWrong posts, not comments.
EDIT: Now it works for comments too.
Time is real, so I’m not a fan of timelessness. However, while you have a conservative flow in your model, you can still work your way back to histories and thus to the reality of time within a history.
Consider how Bohmian mechanics does it. You have the Schrodinger evolution of a wavefunction, with a conserved flow of probability density. If you chart a trajectory through configuration space according to the gradient of the phase of the wavefunction, you end up with a timelike foliation of (configuration space x time) with nonintersecting trajectories. Add the usual probability measure, and voila, you have a multiverse theory of self-contained worlds which neither split nor merge, and in which the Born rule applies.
In principle you can do the same thing with one of those timeless-looking “wavefunctions of the universe” which show up in quantum cosmology. Here, instead of H psi = i.hbar dpsi/dt, you just have H psi = 0 (where the Hamiltonian is general relativity coupled to other fields). So instead of an evolving wavefunction on a “configuration space of the universe”, you just have a static wavefunction. But you can still take the gradient of psi’s phase, everywhere in that configuration space, and so you can figure out Bohmian trajectories that divide up the universal configuration space into disjoint self-contained histories.
In practice, things are more complicated. In general relativity, you distinguish between coordinate time and physical time (proper time). The proper time which elapses along a specific timelike curve is an invariant, an objective quantity. But it is calculated from a metric, the exact form of which depends on the coordinate system. You can rescale coordinate time, according to some diffeomorphism, but then you adjust the metric accordingly, so that distances, angles, and durations remain the same. If you actually try to follow the program of Bohmian quantum gravity that I outlined, it’s hard to define the wavefunction of the universe without reifying a particular coordinate time, a step which is just like having a preferred frame in special relativity. I suspect that the answer lies in string theory’s holographic principle, which says that a quantum theory containing gravity is equivalent to another quantum theory that doesn’t contain gravity, and which is defined on the boundary of the space inhabited by the first theory. In terms of this second theory, the space away from the boundary is emergent, it’s made of composite degrees of freedom from the boundary theory. In the real world, it’s going to be time which is “emergent”, from the “renormalization group flow” of a Euclidean field theory defined at “past infinity”. In fact, excuse me while I run away and study the Bohmian trajectories for such a theory…
Anyway, bringing this back to Judea Pearl: As soon as mathematics represents a history as a “trajectory” in a state space, it is already becoming a little “timeless” in a formal sense. Consider something as simple as a time series. You can plot it on a graph and now it’s a shape rather than a process. You can specify its properties in a timeless geometrical fashion, even though one of the directions on the graph represents time. In talking about flows on state spaces, I don’t think you’re doing more than this. So what you’re doing is harmless, from the perspective of a time-realist like myself, but it also doesn’t really embody the full revolution of Julian Barbour’s ontological timelessness, which necessarily involves both general relativity and quantum mechanics. General relativity makes proper time a physical variable, and quantum mechanics matters by way of many worlds: Barbour’s multiverse is one of “many moments” (he calls them time capsules). In order to interpret an unevolving wavefunction of the universe, rather than divide it up into trajectories, he completely pulverizes it into moments, one moment for each point in configuration space.
If you want to imitate Barbour’s timelessness, then the crucial step is the ontological one of denying that the moments have a unique past or future. But if you have a conservative causal flow, you can always string the moments together into specific histories, like the Bohmian trajectories. Technical difficulties for the definition of trajectories only enter for relativistic systems, because you want to avoid reifying a particular coordinate time. But for nonrelativistic systems, it looks like formal timelessness in a causal model (in the sense you describe) is just a change of perspective that’s always available and can always be reversed.
The point of timelessness is not to say that time is unreal, merely that it is superfluous.
It’s difficult for me to follow your comment. While I’m familiar with the theories you discuss (with the exception of string theory and quantum cosmology), I don’t see how some of them are linked to this. I’m not trying to do anything so great as unify quantum mechanics and general relativity.
Yes.
Time is no longer “one of the directions on the graph”. If you fix a trajectory, then it comes with it’s own time, but the more interesting object is the flow, which does not have any sense of time.
We agree that whatever I’m doing is mostly harmless.
That will have to wait for someone else. I haven’t read Barbour, and it sounds horrifically difficult.
Probably. But such a thing could still be worthwhile.