Real world data often has the surprising property of “dimensionality reduction”: a small number of latent variables explain a large fraction of the variance in data.
Why is that surprising? The causal structure of the world is very sparse, by the nature of causality. One cause has several effects, so once you scale up to lots of causative variables, you expect to find that large portions of the variance in your data are explained by only a few causal factors.
Causality is indeed the skeleton of data. And oh boy, wait until you hit hierarchical Bayes models!
Only, the variables that explain a lot usually aren’t the variables that are immediately visible – instead they’re hidden from us, and in order to model reality, we need to discover them, which is the function that PCA serves.
Not quite. PCA helps you reduce dimensionality by discovering the directions of variation in your feature-space that explain most of the variation (in fact, a total ordering of the directions of variation in the data by how much variation they explain). Then there’s Independent Components Analysis, which separates your feature data into its most independent/orthogonal directions of variation.
The causal structure of the world is very sparse, by the nature of causality.
Can you expand your reasoning? We do see around us sparse — that is, understandable — causal systems. And even chaotic ones often give rise to simple properties (e.g. motion of huge numbers of molecules → gas laws). But why (ignoring anthropocentric arguments) would one expect to see this?
There are really just three ways the causal structure of reality could go:
Many causes → one effect
One cause → one effect, strictly
One cause → many effects
Since the latter will generate more (apparent) random variables, most observables will end up deriving from a relatively sparse causal structure, even if we assume that the causal structures themselves are sampled uniformly from this selection of three.
So, for instance, parameter-space compression (which is its own topic to explain, but oh well), aka: the hierarchical structure of reality, actually does follow that first item: many micro-level causes give rise to a single macro-level observable. But you’ll still find that most observables come from non-compressive causal structures.
This is why we actually have to work really hard to find out about micro-scale phenomena (things lower on the hierarchy than us): they have fewer observables whose variance is uniquely explicable by reference to a micro-scale causal structure.
Ah, you mean a densely interconnected “almost all to almost all” causal structure. Well, I’d have to guess: because that would look far more like random behavior than causal order, so we wouldn’t even notice it as something to causally analyze!
This is a very interesting point. PCA (or as its time and/or space series version is called, the Karhunen-Loève expansion and/or POD) has not been found to be useful for turbulence modeling, as I recall. There’s a brief section in Pope’s book on turbulence about modeling with this. From what I understand, POD is mostly used for visualization purposes, not to help build models. (It’s worth noting that while my background in fluid dynamics is strong, I know little to nothing about PCA and the like aside from what they apparently do.)
Maybe I don’t actually understand causality, but I think in terms of modeling, we do have a good model (the Navier-Stokes, or N-S, equations) and so in some sense, it’s clear what causes what. In principle, if you run a computer simulation with these equations and the correct boundary conditions, the result will be reasonably accurate. This has been demonstrated through direct simulations of some relatively simple cases like flow through a channel. So that’s not the issue. The actually issue is that you need a lot of computing power to simulate even basic flows, and attempts to develop lower order models have been fairly unsuccessful. So as a model, N-S is of limited utility as-is.
In my view, the “turbulence problem” comes down to two facts: 1. the N-S equations are chaotic (sensitive to initial conditions, so small changes can cause big effect) and 2. they exhibit large scale separation (so the smallest details you need to resolve, the Kolmolgorov scales in most cases are much smaller than the physical dimensions of a problem, say the length of a wing). To understand these points better, imagine that rigid body dynamics was inaccurate (say, modeling the trajectory of a baseball), and you had to model all the individual atoms to get it right. And if one was off that might possibly have a big effect. Obviously that’s a lot harder, and it’s probably computationally intractable outside of a few simple cases. (The chaos part is “avoided” because you probably would simulate an ensemble of initial conditions via Monte Carlo or something else, and get an “ensemble mean” which you would compare against an experiment. This works well from what I understand even if the details are unclear.)
So in some sense, yes, this looks like an “almost all to almost all” causal structure. Though, I looked up a bit about causal diagrams and it’s not even clear to me how you might draw one for turbulence, and not because of turbulence itself. It’s not clear what an “event” might be to me. There isn’t even a precise definition of “turbulence” to begin with, so maybe this should be expected. I suppose on some level such things are arbitrary and you could define an event to be fluid movement in some direction, for each direction, each point in space, and each time. I’m not sure if anyone has done this sort of analysis.
(For the incompressible N-S equations, you can easily say that everything causes everything because the equations are elliptic, so the speed of sound is infinite (which means changes in some place are felt everywhere instantaneously). In other words, the “domain of dependence” is everywhere. But I don’t know if that means these effects are substantial. Obviously in reality, far away from something quiet that’s happening, you don’t notice it, even if the sound waves had time to reach you. In practice, this means that doing incompressible fluid dynamics requires the solution of an elliptic PDE, which can be a pain for reasons unrelated to turbulence.)
Why is that surprising? The causal structure of the world is very sparse, by the nature of causality. One cause has several effects, so once you scale up to lots of causative variables, you expect to find that large portions of the variance in your data are explained by only a few causal factors.
Causality is indeed the skeleton of data. And oh boy, wait until you hit hierarchical Bayes models!
Not quite. PCA helps you reduce dimensionality by discovering the directions of variation in your feature-space that explain most of the variation (in fact, a total ordering of the directions of variation in the data by how much variation they explain). Then there’s Independent Components Analysis, which separates your feature data into its most independent/orthogonal directions of variation.
Can you expand your reasoning? We do see around us sparse — that is, understandable — causal systems. And even chaotic ones often give rise to simple properties (e.g. motion of huge numbers of molecules → gas laws). But why (ignoring anthropocentric arguments) would one expect to see this?
There are really just three ways the causal structure of reality could go:
Many causes → one effect
One cause → one effect, strictly
One cause → many effects
Since the latter will generate more (apparent) random variables, most observables will end up deriving from a relatively sparse causal structure, even if we assume that the causal structures themselves are sampled uniformly from this selection of three.
So, for instance, parameter-space compression (which is its own topic to explain, but oh well), aka: the hierarchical structure of reality, actually does follow that first item: many micro-level causes give rise to a single macro-level observable. But you’ll still find that most observables come from non-compressive causal structures.
This is why we actually have to work really hard to find out about micro-scale phenomena (things lower on the hierarchy than us): they have fewer observables whose variance is uniquely explicable by reference to a micro-scale causal structure.
I need that expanded a lot more. Why not many causes → many effects, for example?
Ah, you mean a densely interconnected “almost all to almost all” causal structure. Well, I’d have to guess: because that would look far more like random behavior than causal order, so we wouldn’t even notice it as something to causally analyze!
We do notice turbulence as something doesn’t look random, and is hard-to-impossible to causally analyze.
Here’s an anecdote. I can’t copy and paste it, but it’s in the middle column.
This is a very interesting point. PCA (or as its time and/or space series version is called, the Karhunen-Loève expansion and/or POD) has not been found to be useful for turbulence modeling, as I recall. There’s a brief section in Pope’s book on turbulence about modeling with this. From what I understand, POD is mostly used for visualization purposes, not to help build models. (It’s worth noting that while my background in fluid dynamics is strong, I know little to nothing about PCA and the like aside from what they apparently do.)
Maybe I don’t actually understand causality, but I think in terms of modeling, we do have a good model (the Navier-Stokes, or N-S, equations) and so in some sense, it’s clear what causes what. In principle, if you run a computer simulation with these equations and the correct boundary conditions, the result will be reasonably accurate. This has been demonstrated through direct simulations of some relatively simple cases like flow through a channel. So that’s not the issue. The actually issue is that you need a lot of computing power to simulate even basic flows, and attempts to develop lower order models have been fairly unsuccessful. So as a model, N-S is of limited utility as-is.
In my view, the “turbulence problem” comes down to two facts: 1. the N-S equations are chaotic (sensitive to initial conditions, so small changes can cause big effect) and 2. they exhibit large scale separation (so the smallest details you need to resolve, the Kolmolgorov scales in most cases are much smaller than the physical dimensions of a problem, say the length of a wing). To understand these points better, imagine that rigid body dynamics was inaccurate (say, modeling the trajectory of a baseball), and you had to model all the individual atoms to get it right. And if one was off that might possibly have a big effect. Obviously that’s a lot harder, and it’s probably computationally intractable outside of a few simple cases. (The chaos part is “avoided” because you probably would simulate an ensemble of initial conditions via Monte Carlo or something else, and get an “ensemble mean” which you would compare against an experiment. This works well from what I understand even if the details are unclear.)
So in some sense, yes, this looks like an “almost all to almost all” causal structure. Though, I looked up a bit about causal diagrams and it’s not even clear to me how you might draw one for turbulence, and not because of turbulence itself. It’s not clear what an “event” might be to me. There isn’t even a precise definition of “turbulence” to begin with, so maybe this should be expected. I suppose on some level such things are arbitrary and you could define an event to be fluid movement in some direction, for each direction, each point in space, and each time. I’m not sure if anyone has done this sort of analysis.
(For the incompressible N-S equations, you can easily say that everything causes everything because the equations are elliptic, so the speed of sound is infinite (which means changes in some place are felt everywhere instantaneously). In other words, the “domain of dependence” is everywhere. But I don’t know if that means these effects are substantial. Obviously in reality, far away from something quiet that’s happening, you don’t notice it, even if the sound waves had time to reach you. In practice, this means that doing incompressible fluid dynamics requires the solution of an elliptic PDE, which can be a pain for reasons unrelated to turbulence.)