First, imagine that we ignore the time data. Now we just have a bunch of temperature data points [T0, T1, …] and strain data points [e0, e1, …]. In fact, in order to truly ignore time data, we cannot even order the points according to time! But that means that we no longer have any way to line up the points T0 with e0, T1 with e1, etc. Without any way to match up temperature points to corresponding strain points, the temperature and strain data are randomly ordered, and the correlation disappears!
That is not how d-separation works. If you control for time, the temperature and the strain on the tuning fork should be uncorrelated. We would expect the temperature to roughly follow a 24-hour cycle, plus some random noise. If the tuning fork has a 24-hour period, then we would expect the same thing to be true of the strain. But it would be very strange if the random noise in the temperature were correlated with the random noise in the mechanical strain (e.g. if, once we already knew it was 3 pm, reading the thermometer on the window tells you something about the strain on the tuning fork). That could plausibly happen if the material that the tuning fork is made of has different properties at different temperatures, changing the strain, but I think you meant to imply that neither one of these causes the other, so I’ll ignore that for now.
In a Bayes net, not conditioning on a variable doesn’t mean that you stop lining up the data into samples with a value for each variable, and declare that everything is uncorrelated with everything else; it just means that keep the data lined up in samples and look for correlations without paying attention to the variable you are not conditioning on. In this case, the temperature and mechanical strain should be very highly correlated if you do not condition on time, because time will still be there as a lurking variable.
Yup, you’re right. That’s the right way to handle it, and it yields time as the common cause of temperature and strain, as we’d expect.
Now that I’m knee-deep in it, I do think this crazy concept of separating sets of values of variables from the mappings between the sets has something to it. It isn’t necessary for the example in the main post, but I think the example I gave with mice in one of the other comments still applies. The mapping between points is legitimately a variable unto itself, so it seems like it should be possible to handle it like other variables. It might even be useful to do so, since the mapping is nonparametric.
Anyway, thanks for giving a proper analysis of the problem.
I’m not sure what you see in it. For the mouse thing, it seems to suggest that the correlation between the variables causes the identity of mouse, which is, like the time thing, exactly the wrong way round. You say it “isn’t necessary for the example in the main post” but it’s more than unnecessary—it gives an answer that’s completely backwards.
That is not how d-separation works. If you control for time, the temperature and the strain on the tuning fork should be uncorrelated. We would expect the temperature to roughly follow a 24-hour cycle, plus some random noise. If the tuning fork has a 24-hour period, then we would expect the same thing to be true of the strain. But it would be very strange if the random noise in the temperature were correlated with the random noise in the mechanical strain (e.g. if, once we already knew it was 3 pm, reading the thermometer on the window tells you something about the strain on the tuning fork). That could plausibly happen if the material that the tuning fork is made of has different properties at different temperatures, changing the strain, but I think you meant to imply that neither one of these causes the other, so I’ll ignore that for now.
In a Bayes net, not conditioning on a variable doesn’t mean that you stop lining up the data into samples with a value for each variable, and declare that everything is uncorrelated with everything else; it just means that keep the data lined up in samples and look for correlations without paying attention to the variable you are not conditioning on. In this case, the temperature and mechanical strain should be very highly correlated if you do not condition on time, because time will still be there as a lurking variable.
Yup, you’re right. That’s the right way to handle it, and it yields time as the common cause of temperature and strain, as we’d expect.
Now that I’m knee-deep in it, I do think this crazy concept of separating sets of values of variables from the mappings between the sets has something to it. It isn’t necessary for the example in the main post, but I think the example I gave with mice in one of the other comments still applies. The mapping between points is legitimately a variable unto itself, so it seems like it should be possible to handle it like other variables. It might even be useful to do so, since the mapping is nonparametric.
Anyway, thanks for giving a proper analysis of the problem.
I’m not sure what you see in it. For the mouse thing, it seems to suggest that the correlation between the variables causes the identity of mouse, which is, like the time thing, exactly the wrong way round. You say it “isn’t necessary for the example in the main post” but it’s more than unnecessary—it gives an answer that’s completely backwards.
That’s exactly why it’s interesting.