I think the words Richard used in his question denoted the mutual information between the functions A and B, but I think he meant to ask about the mutual information between two time series datasets sampled from A and B over the same interval.
And my point was that this is an irrelevant comparison. When you look at the data sets, you want to know if they are mutually informative (if learning one can tell you about the other). A linear statistical correlation—which Kennaway showed is absent—is one way that the datasets can be mutually informative, but it is not the only way.
If you know the ordered, timewise development of each variable, you have extra information to use. If you discard this knowledge of the time ordering, and are left with just simultaneous pairs (pairs of the form [A(t0),B(t0)] ) then yes, as Kennaway points out, you’re hosed. So?
One could ask both questions, but as Cyan points out, if you know the function A of this example exactly, then you also know B exactly. What do you know about B, though, when you know A only approximately, for example, by sampling a time series? As the sample time increases beyond the autocorrelation time of A then the amount of information you get about B converges to zero, in the sense that given all of both series up to A(t) and B(t-1), the distribution of B(t) is almost identical to its unconditional distribution.
I’m sure there is a general technical definition, BTW, even though I haven’t seen it. This is not a rhetorical question.
My whole argument rests on a weaker reed than I first appreciated, because the definition of mutual information I linked is for univariate random variables. When I searched for a definition of mutual information for stochastic processes, all I could really find was various people writing that it was a generalization of mutual information for random variables in “the natural way”. But the point you bring up is actually a step in the direction of a stronger argument, not a weaker one. Sampling the function to get a time series makes a vector-valued random variable out of a stochastic process, and numerical differentiation on that random vector is still deterministic. My argument then follows from the definition of multivariate mutual information.
Sampling the function to get a time series makes a vector-valued random variable out of a stochastic process, and numerical differentiation on that random vector is still deterministic.
This is not correct. Given the vector of all values of A sampled at intervals dt, the derivative of that vector—that is, the time series for B—is not determined by the vector itself, only by the complete trajectory of A. The longer dt is, the less the vector tells you about B.
I think the words Richard used in his question denoted the mutual information between the functions A and B, but I think he meant to ask about the mutual information between two time series datasets sampled from A and B over the same interval.
And my point was that this is an irrelevant comparison. When you look at the data sets, you want to know if they are mutually informative (if learning one can tell you about the other). A linear statistical correlation—which Kennaway showed is absent—is one way that the datasets can be mutually informative, but it is not the only way.
If you know the ordered, timewise development of each variable, you have extra information to use. If you discard this knowledge of the time ordering, and are left with just simultaneous pairs (pairs of the form [A(t0),B(t0)] ) then yes, as Kennaway points out, you’re hosed. So?
One could ask both questions, but as Cyan points out, if you know the function A of this example exactly, then you also know B exactly. What do you know about B, though, when you know A only approximately, for example, by sampling a time series? As the sample time increases beyond the autocorrelation time of A then the amount of information you get about B converges to zero, in the sense that given all of both series up to A(t) and B(t-1), the distribution of B(t) is almost identical to its unconditional distribution.
I’m sure there is a general technical definition, BTW, even though I haven’t seen it. This is not a rhetorical question.
My whole argument rests on a weaker reed than I first appreciated, because the definition of mutual information I linked is for univariate random variables. When I searched for a definition of mutual information for stochastic processes, all I could really find was various people writing that it was a generalization of mutual information for random variables in “the natural way”. But the point you bring up is actually a step in the direction of a stronger argument, not a weaker one. Sampling the function to get a time series makes a vector-valued random variable out of a stochastic process, and numerical differentiation on that random vector is still deterministic. My argument then follows from the definition of multivariate mutual information.
This is not correct. Given the vector of all values of A sampled at intervals dt, the derivative of that vector—that is, the time series for B—is not determined by the vector itself, only by the complete trajectory of A. The longer dt is, the less the vector tells you about B.
True. I was also assuming that