That is what A and B are: a randomly wandering variable A and its rate of change B.
Maybe I’m not quite understanding, but it seems to me that your argument relies on a rather broad definition of “causality”. B may be dependent on A, but to say that A “causes” B seems to ignore some important connotations of the concept.
I think what bugs me about it is that “causality” implies a directness of the dependency between the two events. At first glance, this example seems like a direct relationship. But I would argue that B is not caused by A alone, but by both A’s current and previous states. If you were to transform A so that a given B depended directly on a given A’, I think you would indeed see a correlation.
I realize that I’m kind of arguing in a circle here; what I’m ultimately saying is that the term “cause” ought to imply correlation, because that is more useful to us than a synonym for “determine”, and because that is more in line (to my mind, at least) with the generally accepted connotations of the word.
Maybe I’m not quite understanding, but it seems to me that your argument relies on a rather broad definition of “causality”. B may be dependent on A, but to say that A “causes” B seems to ignore some important connotations of the concept.
Very true. Once again, I’m going to have to recommend in the context of a Richard Kennaway post, the use of more precise concepts. Instead of “correlation”, we should be talking about “mutual information”, and it would be helpful if we used Judea Pearl’s definition of causality.
Mutual information between two variables means (among many equivalent definitions) how much you learn about one variable by learning the other. Statistical correlation is one way that there can be mutual information between two variables, but not the only way.
So, like what JGWeissman said, there can be mutual information between the two series even in the absence of a statistical correlation that directly compares time t in one to time t in the other. For example, there is mutual information between sin(t) and cos(t), even though d(sin(t))/dt = cos(t), and even though they’re simultaneously uncorrelated (i.e. uncorrelated when comparing time t to time t). The reason there is mutual information is that if you know sin(t), a simple time-shift tells you cos(t).
As for causation, the Pearl definition is (and my apologies I may not get this right) that:
“A causes B iff, after learning A, nothing else at the time of A or B gives you information about B. (and A is the minimal such set for which this is true)”
In other words, A causes B iff A is the minimal set for which B is conditionally independent given A.
So, anyone want to rephrase Kennaway’s post with those definitions?
But I would argue that B is not caused by A alone, but by both A’s current and previous states.
This is the right idea. For small epsilon, B(t) should have a weak negative correlation with A(t—epsilon), a weak positive correlation with A(t + epsilon). and a strong positive correlation with the difference A(t + epsilon) - A(t—epsilon).
The function A causes the function B, but the value of A at time t does not cause the value of B at time t. Therefore the lack of correlation between A(t) and B(t) does not contradict causation implying correlation.
Therefore the lack of correlation between A(t) and B(t) does not contradict causation implying correlation.
Only trivially. Since B = dA/dt, the correlation between B and dA/dt is perfect. Likewise for any other relationship B = F(A): B correlates perfectly with F(A). But you would only compare B and F(A) if you already had some reason to guess they were related, and having done so would observe they were the same and not trouble with correlations at all.
If you do not know that B = dA/dt and have no reason to guess this hypothesis, correlations will tell you nothing, especially if your time series data has too large a time step—as positively recommended in the linked paper—to see dA/dt at all.
I don’t think you are arguing in a circle. B is caused by current and previous As. Obviously we’re not going to see a correlation unless we control for the previous state of A. Properly controlled the relationship between the two variables will be one-to-one, won’t it?
But I would argue that B is not caused by A alone, but by both A’s current and previous states.
Consider not the abstract situation of B = dA/dt, but the concrete example of the signal generator. It would be a perverse reading of the word “cause” to say that the voltage does not cause the current. You can make the current be anything you like by suitably manipulating the voltage.
But let this not degenerate into an argument about the “real” meaning of “cause”. Consider instead what is being said about the systems studied by the authors referenced in the post.
Lacerda, Spirtes, et al. do not use your usage. They talk about time series equations in which the current state of each variable depends on the previous states of some variables, but still they draw causal graphs which do not have a node for every time instant of every variable, but a node for every variable. When x(i+1) = b y(i) + c z(i), they talk about y and z causing x.
The reason that none of their theorems apply to the system B = dA/dt is that when I discretise time and put this in the form of a difference equation, it violates the precondition they state in section 1.2.2. This will be true of the discretisation of any system of ordinary differential equations. It appears to me that that is a rather significant limitation of their approach to causal analysis.
Consider not the abstract situation of B = dA/dt, but the concrete example of the signal generator. It would be a perverse reading of the word “cause” to say that the voltage does not cause the current. You can make the current be anything you like by suitably manipulating the voltage.
But you can make a similar statement for just about any situation where B = dA/dt, so I think it’s useful to talk about the abstract case.
For example, you can make a car’s velocity anything you like by suitably manipulating its position. Would you then say that the car’s position “causes” its velocity? That seems awkward at best. You can control the car’s acceleration by manipulating its velocity, but to say “velocity causes acceleration” actually sounds backwards.
But let this not degenerate into an argument about the “real” meaning of “cause”. Consider instead what is being said about the systems studied by the authors referenced in the post.
But isn’t this really the whole argument? If the authors implied that every relationship between two functions implies correlation between their raw values, then that is, I think, self-evidently wrong. The question then, is do we imply correlation when we refer to causation? I think the answer is generally “yes”.
I think intervention is the key idea missing from the above discussion of which of the the derivative function and the integrated function is the cause and which is the effect. In the signal generator example, voltage is a cause of current because we can intervene directly on the voltage. In the car example, acceleration is a cause of velocity because we can intervene directly on acceleration. This is not too helpful on its own, but maybe it will point the discussion in a useful direction.
Maybe I’m not quite understanding, but it seems to me that your argument relies on a rather broad definition of “causality”. B may be dependent on A, but to say that A “causes” B seems to ignore some important connotations of the concept.
I think what bugs me about it is that “causality” implies a directness of the dependency between the two events. At first glance, this example seems like a direct relationship. But I would argue that B is not caused by A alone, but by both A’s current and previous states. If you were to transform A so that a given B depended directly on a given A’, I think you would indeed see a correlation.
I realize that I’m kind of arguing in a circle here; what I’m ultimately saying is that the term “cause” ought to imply correlation, because that is more useful to us than a synonym for “determine”, and because that is more in line (to my mind, at least) with the generally accepted connotations of the word.
Very true. Once again, I’m going to have to recommend in the context of a Richard Kennaway post, the use of more precise concepts. Instead of “correlation”, we should be talking about “mutual information”, and it would be helpful if we used Judea Pearl’s definition of causality.
Mutual information between two variables means (among many equivalent definitions) how much you learn about one variable by learning the other. Statistical correlation is one way that there can be mutual information between two variables, but not the only way.
So, like what JGWeissman said, there can be mutual information between the two series even in the absence of a statistical correlation that directly compares time t in one to time t in the other. For example, there is mutual information between sin(t) and cos(t), even though d(sin(t))/dt = cos(t), and even though they’re simultaneously uncorrelated (i.e. uncorrelated when comparing time t to time t). The reason there is mutual information is that if you know sin(t), a simple time-shift tells you cos(t).
As for causation, the Pearl definition is (and my apologies I may not get this right) that:
“A causes B iff, after learning A, nothing else at the time of A or B gives you information about B. (and A is the minimal such set for which this is true)”
In other words, A causes B iff A is the minimal set for which B is conditionally independent given A.
So, anyone want to rephrase Kennaway’s post with those definitions?
This is the right idea. For small epsilon, B(t) should have a weak negative correlation with A(t—epsilon), a weak positive correlation with A(t + epsilon). and a strong positive correlation with the difference A(t + epsilon) - A(t—epsilon).
The function A causes the function B, but the value of A at time t does not cause the value of B at time t. Therefore the lack of correlation between A(t) and B(t) does not contradict causation implying correlation.
Only trivially. Since B = dA/dt, the correlation between B and dA/dt is perfect. Likewise for any other relationship B = F(A): B correlates perfectly with F(A). But you would only compare B and F(A) if you already had some reason to guess they were related, and having done so would observe they were the same and not trouble with correlations at all.
If you do not know that B = dA/dt and have no reason to guess this hypothesis, correlations will tell you nothing, especially if your time series data has too large a time step—as positively recommended in the linked paper—to see dA/dt at all.
I don’t think you are arguing in a circle. B is caused by current and previous As. Obviously we’re not going to see a correlation unless we control for the previous state of A. Properly controlled the relationship between the two variables will be one-to-one, won’t it?
Consider not the abstract situation of B = dA/dt, but the concrete example of the signal generator. It would be a perverse reading of the word “cause” to say that the voltage does not cause the current. You can make the current be anything you like by suitably manipulating the voltage.
But let this not degenerate into an argument about the “real” meaning of “cause”. Consider instead what is being said about the systems studied by the authors referenced in the post.
Lacerda, Spirtes, et al. do not use your usage. They talk about time series equations in which the current state of each variable depends on the previous states of some variables, but still they draw causal graphs which do not have a node for every time instant of every variable, but a node for every variable. When x(i+1) = b y(i) + c z(i), they talk about y and z causing x.
The reason that none of their theorems apply to the system B = dA/dt is that when I discretise time and put this in the form of a difference equation, it violates the precondition they state in section 1.2.2. This will be true of the discretisation of any system of ordinary differential equations. It appears to me that that is a rather significant limitation of their approach to causal analysis.
But you can make a similar statement for just about any situation where B = dA/dt, so I think it’s useful to talk about the abstract case.
For example, you can make a car’s velocity anything you like by suitably manipulating its position. Would you then say that the car’s position “causes” its velocity? That seems awkward at best. You can control the car’s acceleration by manipulating its velocity, but to say “velocity causes acceleration” actually sounds backwards.
But isn’t this really the whole argument? If the authors implied that every relationship between two functions implies correlation between their raw values, then that is, I think, self-evidently wrong. The question then, is do we imply correlation when we refer to causation? I think the answer is generally “yes”.
I think intervention is the key idea missing from the above discussion of which of the the derivative function and the integrated function is the cause and which is the effect. In the signal generator example, voltage is a cause of current because we can intervene directly on the voltage. In the car example, acceleration is a cause of velocity because we can intervene directly on acceleration. This is not too helpful on its own, but maybe it will point the discussion in a useful direction.