It is rarely too difficult to specify the true model (or a space of models containing the true model). What’s hard is updating on less-than-fully-informative evidence or, in some cases, even computing what the true model predicts at all (i.e. likelihoods). So when we say it is “too costly to model from first principles”, we should keep in mind that we don’t mean the true model space can’t even be written down efficiently. In particular, this means that “every member of the set of models available to us is false” need not hold. Similarly, Bayesian probability and Ockham’s razor and whatnot can still apply, but we need efficient approximations.
(Side note: “different processes may become important in future” is not actually a problem for Ockham’s razor per se. That’s a problem for causal models, and Bayesian probability + Ockham’s razor are quite capable of learning causal models.)
(Another side note: likelihoods are never actually zero, they’re just very small. But likelihoods are very small for any large amount of data anyway, so there’s nothing unusual about that; a model space which doesn’t contain the true model isn’t really a problem from that perspective.)
If we want to attack these sorts of problems rigorously from first principles, then the central challenge is to find rigorous approximations of the true underlying models. The main field I know of which studies this sort of problem directly is statistical mechanics, and a number of reasonably-general-purpose tools exist in that field which could potentially be applied in other areas (e.g. this). Actually developing those applications, however, is an area of active research.
That said… when I look at the history of failure of “statistical”, non-first-principles models in various fields (especially economics), it looks like they mainly fail because they don’t handle causality properly. That makes sense—the theory of causality is a relatively recent development, so of course 20th-century stats people built models which failed to handle it. Armed with modern tools, it’s entirely plausible that we can handle causality well without having to ground everything in first-principles.
Thanks for your detailed reply. (And sorry I couldn’t format the below well—I don’t seem to get any formatting options in my browser.)
“It is rarely too difficult to specify the true model...this means that “every member of the set of models available to us is false” need not hold”
I agree we could find a true model to explain the economy, climate etc. (presumably the theory of everything in physics). But we don’t have the computational power to make predictions of such systems with that model—so my question is about how should we make predictions when the true model is not practically applicable? By “the set of models available to us”, I meant the models we could actually afford to make predictions with. If the true model is not in that set, then it seems to be that all of these models must be false.
‘”different processes may become important in future” is not actually a problem for Ockham’s razor per se. That’s a problem for causal models’
To take the climate example, say scientists had figured out that there were a biological feedback that kicks in once global warming has gone past 2C (e.g. bacteria become more efficient at decomposing soil and releasing CO2). Suppose you have one model that includes a representation of that feedback (e.g. as a subprocess) and one that does not but is equivalent in every other way (e.g. is coded like the first model but lacks the subprocess). Then isn’t the second model simpler according to metrics like the minimum description length, so that it would be weighted higher if we penalised models using such metrics? But this seems the wrong thing to do, if we think the first model is more likely to give a good prediction.
Now the thought that occurred to me when writing that is that the data the scientists used to deduce the existence of the feedback ought to be accounted for by the models that are used, and this would give low posterior weight to models that don’t include the feedback. But doing this in practice seems hard. Also, it’s not clear to me if there would be a way to tell between models that represent the process but don’t connect it properly to predicting the climate e.g. they have a subprocess that says more CO2 is produced by bacteria at warming higher than 2C, but then don’t actually add this CO2 to the atmosphere, or something.
“likelihoods are never actually zero, they’re just very small”
If our models were deterministic, then if they were not true, wouldn’t it be impossible for them to produce the observed data exactly, so that the likelihood of the data given any of those models would be zero? (Unless there was more than one process that could give rise to the same data, which seems unlikely in practice.) Now if we make the models probabilistic and try to design them such that there is a non-zero chance that the data would be a possible sample from the model, then the likelihood can be non-zero. But it doesn’t seem necessary to do this—models that are false can still give predictions that are useful for decision-making. Also, it’s not clear if we could make a probabilistic model that would have non-zero likelihoods for something as complex as the climate that we could run on our available computers (and that isn’t something obviously of low value for prediction like just giving probability 1/N to each of N days of observed data). So it still seems like it would be valuable to have a principled way of predicting using models that give a zero likelihood of the data.
“the central challenge is to find rigorous approximations of the true underlying models. The main field I know of which studies this sort of problem directly is statistical mechanics, and a number of reasonably-general-purpose tools exist in that field which could potentially be applied in other areas (e.g. this).”
Yes I agree. Thanks for the link—it looks very relevant and I’ll check it out. Edit—I’ll just add, echoing part of my reply to Kenny’s answer, that whilst statistical averaging has got human modellers a certain distance, adding representations of processes whose effects get missed by the averaging seems to add a lot of value (e.g. tropical thunderstorms in the case of climate). So there seems to be something additional to averaging that can be used, to do with coming up with simplified models of processes you can see are missed out by the averaging.
On causality, whilst of course correcting this is desirable, if the models we can afford to compute with can’t reproduce the data, then presumably they are also not reproducing the correct causal graph exactly? And any causal graph we could compute with will not be able to reproduce the data? (Else it would seem that a causal graph could somehow hugely compress the true equations without information loss—great if so!)
Side note: one topic I’ve been reading about recently which is directly relevant to some of your examples (e.g. thunderstorms) is multiscale modelling. You might find it interesting.
Thanks, yes this is very relevant to thinking about climate modelling, with the dominant paradigm being that we can separately model phenomena above and below the resolved scale—there’s an ongoing debate, though, about whether a different approach would work better, and it gets tricky when the resolved scale gets close to the size of important types of weather system.
To take the climate example, say scientists had figured out that there were a biological feedback that kicks in once global warming has gone past 2C (e.g. bacteria become more efficient at decomposing soil and releasing CO2). Suppose you have one model that includes a representation of that feedback (e.g. as a subprocess) and one that does not but is equivalent in every other way (e.g. is coded like the first model but lacks the subprocess). Then isn’t the second model simpler according to metrics like the minimum description length, so that it would be weighted higher if we penalised models using such metrics? But this seems the wrong thing to do, if we think the first model is more likely to give a good prediction.
The trick here is that the data on which the model is trained/fit has to include whatever data the scientists used to learn about that feedback loop in the first place. As long as that data is included, the model which accounts for it will have lower minimum description length. (This fits in with a general theme: the minimum-complexity model is simple and general-purpose; the details are learned from the data.)
Now the thought that occurred to me when writing that is that the data the scientists used to deduce the existence of the feedback ought to be accounted for by the models that are used, and this would give low posterior weight to models that don’t include the feedback. But doing this in practice seems hard.
… I’m responding as I read. Yup, exactly. As the Bayesians say, we do need to account for all our prior information if we want reliably good results. In practice, this is “hard” in the sense of “it requires significantly more complicated programming”, but not in the sense of “it increases the asymptotic computational complexity”. The programming is more complicated mainly because the code needs to accept several qualitatively different kinds of data, and custom code is likely needed for hooking up each of them. But that’s not a fundamental barrier; it’s still the same computational challenges which make the approach impractical.
it’s not clear to me if there would be a way to tell between models that represent the process but don’t connect it properly to predicting the climate...
Again, we need to include whatever data allowed scientists to connect it to the climate in the first place. (In some cases this is just fundamental physics, in which case it’s already in the model.)
If our models were deterministic, then if they were not true, wouldn’t it be impossible for them to produce the observed data exactly, so that the likelihood of the data given any of those models would be zero? (Unless there was more than one process that could give rise to the same data, which seems unlikely in practice.)
Picture a deterministic model which uses fundamental physics, and models the joint distribution of position and momentum of every atom comprising the Earth. The unknown in this model is the initial conditions—the initial position and momentum of every particle (also particle identity, i.e. which element/isotope each is, but we’ll ignore that). Now, imagine how many of the possible initial conditions are compatible with any particular high-level data we observe. It’s a massive number!
Point is: the deterministic part of a model of a fundamental physical model is the dynamics; the initial conditions are still generally unknown. Conceptually, when we fit the data, we’re mostly looking for initial conditions which match. So zero likelihoods aren’t really an issue; the issue is computing with a joint distribution over position and momentum of so many particles. That’s what statistical mechanics is for.
whilst statistical averaging has got human modellers a certain distance, adding representations of processes whose effects get missed by the averaging seems to add a lot of value
The corresponding problem in statistical mechanics is to identify the “state variables”—the low-level variables whose averages correspond to macroscopic observables. For instance, the ideal gas law uses density, kinetic energy, and force on container surfaces (whose macroscopic averages correspond to density, temperature, and pressure). Fluid flow, rather than averaging over the whole system, uses density and particle velocity within each little cell of space.
The point: if an effect is “missed by averaging”, that’s usually not inherent to averaging as a technique. The problem is that people average over poorly-chosen features.
Jaynes argued that the key to choosing high-level features is reproducibility: what high-level variables do experimenters need to control in order to get a consistent result distribution? If we consistently get the same results without holding X constant (where X includes e.g. initial conditions of every particle), then apparently X isn’t actually relevant to the result, so we can average out X. Also note that there’s some degrees of freedom in what “results” we’re interested in. For instance, turbulence has macroscopic behavior which depends on low-level initial conditions, but the long-term time average of forces from a turbulent flow usually doesn’t depend on low-level initial conditions—and for engineering purposes, it’s often that time average which we actually care about.
if the models we can afford to compute with can’t reproduce the data, then presumably they are also not reproducing the correct causal graph exactly? And any causal graph we could compute with will not be able to reproduce the data?
Once we move away from stat mech and approximations of low-level models, yes, this becomes a problem. However, two counterpoints. First, this is the sort of problem where the output says “well, the best model is one with like a gazillion edges, and there’s a bunch that all fit about equally well, so we have no idea what will happen going forward”. That’s unsatisfying, but at least it’s not wrong. Second, if we do get that sort of result, then it probably just isn’t possible to do better with the high-level variables chosen. Going back to reproducibility and selection of high-level variables: if we’ve omitted some high-level variable which really does impact the results we’re interested in, then “we have no idea what will happen going forward” really is the right answer.
I think I need to think more about the likelihood issue. I still feel like we might be thinking about different things—when you say “a deterministic model which uses fundamental physics”, this would not be in the set of models that we could afford to run to make predictions for complex systems. For the models we could afford to run, it seems to me that no choice of initial conditions would lead them to match the data we observe, except by extreme coincidence (analogous to a simple polynomial just happening to pass through all the datapoints produced by a much more complex function).
I’ve gone through Jaynes’ paper now from the link you gave. His point about deciding what macroscopic variables matter is well-made. But you still need a model of how the macroscopic variables you observe relate to the ones you want to predict. In modelling atmospheric processes, simple spatial averaging of the fluid dynamics equations over resolved spatial scales gets you some way, but then changing the form of the function relating the future to present states (“adding representations of processes” as I put it before) adds additional skill. And Jaynes’ paper doesn’t seem to say how you should choose this function.
For the models we could afford to run, it seems to me that no choice of initial conditions would lead them to match the data we observe, except by extreme coincidence (analogous to a simple polynomial just happening to pass through all the datapoints produced by a much more complex function).
Ok, let’s talk about computing with error bars, because it sounds like that’s what’s missing from what you’re picturing.
The usual starting point is linear error—we assume that errors are small enough for linear approximation to be valid. (After this we’ll talk about how to remove that assumption.) We have some multivariate function f(x) - imagine that x is the full state of our simulation at some timestep, and f calculates the state at the next timestep. The value ¯¯¯x of x in our program is really just an estimate of the “true” value x; it has some error Δx=x−¯¯¯x. As a result, the value of ¯¯¯f of f in our program also has some error Δf=f−¯¯¯f. Assuming the error is small enough for linear approximation to hold, we have:
Δf=f−¯¯¯f=f(¯¯¯x+Δx)−f(¯¯¯x)≈(dfdx|¯¯¯x)Δx
where dfdx is the Jacobian, i.e. the matrix of derivatives of every entry of f(x) with respect to every entry of x.
Next, assume that Δx has covariance matrix Σx, and we want to compute the covariance matrix Σf of Δf. We have a linear relationship between Δx and Δf, so we use the usual formula for linear transformation of covariance:
Σf=dfdxTΣxdfdx
Now imagine iterating this at every timestep: we compute the timestep itself, then differentiate that timestep, and matrix multiply our previous uncertainty on both sides by the derivative matrix to get the new uncertainty:
¯¯¯x(t+1)=f(¯¯¯x(t))
Σx(t+1)=(dfdx|¯¯¯x(t))TΣx(t)(dfdx|¯¯¯x(t))
Now, a few key things to note:
For most systems of interest, that uncertainty is going to grow over time, usually exponentially. That’s correct: in a chaotic system, if the initial conditions are uncertain, then of course we should become more and more uncertain about the system’s state over time.
Those formulas only propagate uncertainty in previous state to uncertainty in the next state. Really, there’s also new uncertainty introduced at each timestep, e.g. from error in f itself (i.e. due to averaging) or from whatever’s driving the system. Typically, such errors are introduced as an additive term—i.e. we compute the covariance in x introduced by each source of error, and add them to the propagated covariance matrix at each timestep.
Actually storing the whole covariance matrix would take O(n2) space if x has n elements, which is completely impractical when x is the whole state of a finite element simulation. We make this practical the same way we make all matrix operations practical in numerical computing: exploit sparsity/structure. This is application-specific, but usually the covariance can be well-approximated as the sum of sparse “local” covariances and low-rank “global” covariances.
Likewise with the update: we don’t actually want to compute the n-by-n derivative matrix and then matrix-multiply with the covariance. Most backpropagation libraries expose the derivative as a linear operator rather than an explicit matrix, and we want to use it that way. Again, specifics will vary, depending on the structure of f and of the (approximated) covariance matrix.
In many applications, we have data coming in over time. That data reduces our uncertainty every time it comes in—at that point, we effectively have a Kalman filter. If enough data is available, the uncertainty remains small enough for the linear approximation to continue to hold, and whole thing works great.
If the uncertainty does become too large for linear approximation, then we need to resort to other methods for representing uncertainty, rather than just a covariance matrix. Particle filters are one simple-but-effective fallback, and can be combined with linear uncertainty as well.
In general, if this sounds interesting and you want to know more, it’s covered in a lot of different contexts. I first saw most of it in an autonomous vehicles course; besides robotics, it’s also heavily used in economic models, and sometimes systems/control theory courses will focus on this sort of stuff.
Is this starting to sound like a model for which the observed data would have nonzero probability?
Do you mean you’d be adding the probability distribution with that covariance matrix on top of the mean prediction from f, to make it a probabilistic prediction? I was talking about deterministic predictions before, though my text doesn’t make that clear. For probabilistic models, yes adding an uncertainty distribution may make result in non-zero likelihoods. But if we know the true dynamics are deterministic (pretend there’s no quantum effects, which are largely irrelevant for our prediction errors for systems in the classical physics domain), then we still know the model is not true, and so it seems difficult to interpret p if we were to do Bayesian updating.
Likelihoods are also not obviously (to me) very good measures of model quality for chaotic systems, either—in these cases we know that even if we had the true model, its predictions would diverge from reality due to errors in the initial condition estimates, but it would trace out the correct attractor—and its the attractor geometry (conditional on boundary conditions) that we’d seem to really want to assess. Perhaps then it would have a higher likelihood than every other model, but it’s not obvious to me, and it’s not obvious that there’s not a better metric for leading to good inferences when we don’t have the true model.
Basically the logic that says to use Bayes for deducing the truth does not seem to carry over in an obvious way (to me) to the case when we want to predict but can’t use the true model.
But if we know the true dynamics are deterministic (pretend there’s no quantum effects, which are largely irrelevant for our prediction errors for systems in the classical physics domain), then we still know the model is not true, and so it seems difficult to interpret p if we were to do Bayesian updating.
Ah, that’s where we need to apply more Bayes. The underlying system may be deterministic at the macroscopic level, but that does not mean we have perfect knowledge of all the things which effect the system’s trajectory. Most of the uncertainty in e.g. a weather model would not be quantum noise, it would be things like initial conditions, measurement noise (e.g. how close is this measurement to the actual average over this whole volume?), approximation errors (e.g. from discretization of the dynamics), driving conditions (are we accounting for small variations in sunlight or tidal forces?), etc. The true dynamics may be deterministic, but that doesn’t mean that our estimates of all the things which go into those dynamics have no uncertainty. If the inputs have uncertainty (which of course they do), then the outputs also have uncertainty.
The main point of probabilistic models is not to handle “random” behavior in the environment, it’s to quantify uncertainty resulting from our own (lack of) knowledge of the system’s inputs/parameters.
Likelihoods are also not obviously (to me) very good measures of model quality for chaotic systems, either—in these cases we know that even if we had the true model, its predictions would diverge from reality due to errors in the initial condition estimates, but it would trace out the correct attractor...
Yeah, you’re pointing to an important issue here, although it’s not actually likelihoods which are the problem—it’s point estimates. In particular, that makes linear approximations a potential issue, since they’re implicitly approximations around a point estimate. Something like a particle filter will do a much better job than a Kalman filter at tracing out an attractor, since it accounts for nonlinearity much better.
Anyway, reasoning with likelihoods and posterior distributions remains valid regardless of whether we’re using point estimates. When the system is chaotic but has an attractor, the posterior probability of the system state will end up smeared pretty evenly over the whole attractor. (Although with enough fine-grained data, we can keep track of roughly where on the attractor the system is at each time, which is why Kalman-type models work well in that case.)
So when we say it is “too costly to model from first principles”, we should keep in mind that we don’t mean the true model space can’t even be written down efficiently.
I’m confused. Are you really claiming that modeling the Earth’s climate can be written down “efficiently”? What exactly do you mean by ‘efficiently’? What would a sketch of an efficient description of the “true model space” for the Earth’s climate be?
Extreme answer: just point AIXI at wikipedia. That’s a bit tongue-in-cheek, but it illustrates the concepts well. The actual models (i.e. AIXI) can be very general and compact; rather than AIXI, a specification of low-level physics would be a more realistic model to use for climate. Most of the complexity of the system is then learned from data—i.e. historical weather data, a topo map of the Earth, composition of air/soil/water samples, etc. An exact Bayesian update of a low-level physical model on all that data should be quite sufficient to get a solid climate model; it wouldn’t even take an unrealistic amount of data (data already available online would likely suffice). The problem is that we can’t efficiently compute that update, or efficiently represent the updated model—we’re talking about a joint distribution over positions and momenta of every particle comprising the Earth, and that’s even before we account for quantum. But the prior distribution over positions and momenta of every particle we can represent easily—just use something maxentropic, and the data will be enough to figure out the (relevant parts of the) rest.
So to answer your specific questions:
the “true model space” is just low-level physics
by “efficiently”, I mean the code would be writable by a human and the “training” data would easily fit on your hard drive
Yeah, the usual mechanism by which more data reduces computational difficulty is by directly identifying the values some previously-latent variables. If we know the value of a variable precisely, then that’s easy to represent; the difficult-to-represent distributions are those where there’s a bunch of variables whose uncertainty is large and tightly coupled.
Think of it as a kind of (theoretical) ‘upper bound’ on the problem. None of the actual computable (i.e. on real-world computers built by humans) approximations to AIXI are very good in practice.
The AIXI thing was a joke; a Bayesian update on low-level physics with unknown initial conditions would be superexponentially slow, but it certainly isn’t uncomputable. And the distinction does matter—uncomputability usually indicates fundamental barriers even to approximation, whereas superexponential slowness does not (at least in this case).
In a sense, existing climate models are already “low-level physics” except that “low-level” means coarse aggregates of climate/weather measurements that are so big that they don’t include tropical cyclones! And, IIRC, those models are so expensive to compute that they can only be computed on supercomputers!
But I’m still confused as to whether you’re claiming that someone could implement AIXI and feed it all the data you mentioned.
the prior distribution over positions and momenta of every particle we can represent easily—just use something maxentropic, and the data will be enough to figure out the (relevant parts of the) rest.
You seem to be claiming that “Wikipedia” (or all of the scientific data ever measured) would be enough to generate “the prior distribution over positions and momenta of every particle” and that this data would easily fit on a hard drive. Or are you claiming that such an efficient representation exists in theory? I’m still skeptical of the latter.
The problem is that we can’t efficiently compute that update, or efficiently represent the updated model—we’re talking about a joint distribution over positions and momenta of every particle comprising the Earth, and that’s even before we account for quantum.
This makes me believe that you’re referring to some kind of theoretical algorithm. I understood the asker to wanting something (efficiently) computable, at least relative to actual current climate models (i.e. something requiring no more than supercomputers to use).
But I’m still confused as to whether you’re claiming that someone could implement AIXI and feed it all the data you mentioned.
That was a joke, but computable approximations of AIXI can certainly be implemented. For instance, a logical inductor run on all that data would be conceptually similar for our purposes.
You seem to be claiming that “Wikipedia” (or all of the scientific data ever measured) would be enough to generate “the prior distribution over positions and momenta of every particle” and that this data would easily fit on a hard drive.
No, wikipedia or a bunch of scientific data (much less than all the scientific data ever measured), would be enough data to train a solid climate model from a simple prior over particle distributions and momenta. It would definitely not be enough to learn the position and momentum of every particle; a key point of stat mech is that we do not need to learn the position and momentum of every particle in order to make macroscopic predictions. A simple maxentropic prior over microscopic states plus a (relatively) small amount of macroscopic data is enough to make macroscopic predictions.
This makes me believe that you’re referring to some kind of theoretical algorithm.
The code itself need not be theoretical, but it would definitely be superexponentially slow to run. Making it efficient is where stat mech, multiscale modelling, etc come in. The point I want to make is that the system’s “complexity” is not a fundamental barrier requiring fundamentally different epistemic principles.
… wikipedia or a bunch of scientific data (much less than all the scientific data ever measured), would be enough data to train a solid climate model from a simple prior over particle distributions and momenta. It would definitely not be enough to learn the position and momentum of every particle; a key point of stat mech is that we do not need to learn the position and momentum of every particle in order to make macroscopic predictions. A simple maxentropic prior over microscopic states plus a (relatively) small amount of macroscopic data is enough to make macroscopic predictions.
That’s clearer to me, but I’m still skeptical that that’s in fact possible. I don’t understand how the prior can be considered “over particle distributions and momenta”, except via the theories and models of statistical mechanics, i.e. assuming that those microscopic details can be ignored.
The point I want to make is that the system’s “complexity” is not a fundamental barrier requiring fundamentally different epistemic principles.
I agree with this. But I think you’re eliding how much work is involved in what you described as:
Making it efficient is where stat mech, multiscale modelling, etc come in.
I wouldn’t think that standard statistical mechanics would be sufficient for modeling the Earth’s climate. I’d expect fluid dynamics is also important as well as chemistry, geology, the dynamics of the Sun, etc.. It’s not obvious to me that statistical mechanics would be effective alone in practice.
Ah… I’m talking about stat mech in a broader sense than I think you’re imagining. The central problem of the field is the “bridge laws” defining/expressing macroscopic behavior in terms of microscopic behavior. So, e.g., deriving Navier-Stokes from molecular dynamics is a stat mech problem. Of course we still need the other sciences (chemistry, geology, etc) to define the system in the first place. The point of stat mech is to take low-level laws with lots of degrees of freedom, and derive macroscopic laws from them. For very coarse, high-level models, the “low-level model” might itself be e.g. fluid dynamics.
I think you’re eliding how much work is involved in what you described as...
Yeah, this stuff definitely isn’t easy. As you argued above, the general case of the problem is basically AGI (and also the topic of my own research). But there are a lot of existing tricks and the occasional reasonably-general-tool, especially in the multiscale modelling world and in Bayesian stat mech.
Yes, I don’t think we really disagree. My prior (prior to this extended comments discussion) was that there are lots of wonderful existing tricks, but there’s no real shortcut for the fully general problem and any such shortcut would be effectively AGI anyways.
climate models are already “low-level physics” except that “low-level” means coarse aggregates of climate/weather measurements that are so big that they don’t include tropical cyclones!
Just as as aside, a typical modern climate model will simulate tropical cyclones as emergent phenomena from the coarse-scale fluid dynamics, albeit not enough of the most intense ones. Though, much smaller tropical thunderstorm-like systems are much more crudely represented.
Tangential, but now I’m curious… do you know what discretization methods are typically used for the fluid dynamics? I ask because insufficiently-intense cyclones sound like exactly the sort of thing APIC methods were made to fix, but those are relatively recent and I don’t have a sense for how much adoption they’ve had outside of graphics.
do you know what discretization methods are typically used for the fluid dynamics?
There’s a mixture—finite differencing used to be used a lot but seems to be less common now, semi-Lagrangian advection seems to have taken over from that in models that used it, then some work by doing most of the computations in spectral space and neglecting the smallest spatial scales. Recently newer methods have been developed to work better on massively parallel computers. It’s not my area, though, so I can’t give a very expert answer—but I’m pretty sure the people working on it think hard about trying to not smooth out intense structures (though, that has to be balanced against maintaining numerical stability).
How much are ‘graphical’ methods like APIC incorporated elsewhere in general?
My intuition has certainly been pumped to the effect that models that mimic visual behavior are likely to be useful more generally, but maybe that’s not a widely shared intuition.
I would have hoped that was the case, but that’s interesting that both large and small ones are apparently not so easily emergent.
I wonder whether the models are so coarse that the cyclones that do emerge are in a sense the minimum size. That would readily explain the lack of smaller emergent cyclones. Maybe larger ones don’t emerge because the ‘next larger size’ is too big for the models. I’d think ‘scaling’ of eddies in fluids might be informative: What’s the smallest eddy possibly in some fluid? What other eddy sizes are observed (or can be modeled)?
Not sure if this was intended to be rhetorical, but a big part of what makes turbulence difficult is that we see eddies at many scales, including very small eddies (at least down to the scale that Navier-Stokes holds). I remember a striking graphic about the onset of turbulence in a pot of boiling water, in which the eddies repeatedly halve in size as certain parameter cutoffs are passed, and the number of eddies eventually diverges—that’s the onset of turbulence.
Sorry for being unclear – it was definitely not intended to be rhetorical!
Yes, turbulence was exactly what I was thinking about. At some small enough scale, we probably wouldn’t expect to ‘find’ or be able to distinguish eddies. So there’s probably some minimum size. But then is there any pattern or structure to the larger sizes of eddies? For (an almost certainly incorrect) example, maybe all eddies are always a multiple of the minimum size and the multiple is always an integer power of two. Or maybe there is no such ‘discrete quantization’ of eddy sizes, tho eddies always ‘split’ into nested halves (under certain conditions).
It certainly seems the case tho that eddies aren’t possible as emergent phenomena at a scale smaller than the discretization of the approximation itself.
I wonder whether the models are so coarse that the cyclones that do emerge are in a sense the minimum size.
It’s not my area, but I don’t think that’s the case. My impression is that part of what drives very high wind speeds in the strongest hurricanes is convection on the scale of a few km in the eyewall, so models with that sort of spatial resolution can generate realistically strong systems, but that’s ~20x finer than typical climate model resolutions at the moment, so it will be a while before we can simulate those systems routinely (though, some argue we could do it if we had a computer costing a few billion dollars).
It seems like it might be an example of relatively small structures having potentially arbitrarily large long-term effects on the state of the entire system.
It could be the case tho that the overall effects of cyclones are still statistical at the scale of the entire planet’s climate.
Regardless, it’s a great example of the kind of thing for which we don’t yet have good general learning algorithms.
It is rarely too difficult to specify the true model (or a space of models containing the true model). What’s hard is updating on less-than-fully-informative evidence or, in some cases, even computing what the true model predicts at all (i.e. likelihoods). So when we say it is “too costly to model from first principles”, we should keep in mind that we don’t mean the true model space can’t even be written down efficiently. In particular, this means that “every member of the set of models available to us is false” need not hold. Similarly, Bayesian probability and Ockham’s razor and whatnot can still apply, but we need efficient approximations.
(Side note: “different processes may become important in future” is not actually a problem for Ockham’s razor per se. That’s a problem for causal models, and Bayesian probability + Ockham’s razor are quite capable of learning causal models.)
(Another side note: likelihoods are never actually zero, they’re just very small. But likelihoods are very small for any large amount of data anyway, so there’s nothing unusual about that; a model space which doesn’t contain the true model isn’t really a problem from that perspective.)
If we want to attack these sorts of problems rigorously from first principles, then the central challenge is to find rigorous approximations of the true underlying models. The main field I know of which studies this sort of problem directly is statistical mechanics, and a number of reasonably-general-purpose tools exist in that field which could potentially be applied in other areas (e.g. this). Actually developing those applications, however, is an area of active research.
That said… when I look at the history of failure of “statistical”, non-first-principles models in various fields (especially economics), it looks like they mainly fail because they don’t handle causality properly. That makes sense—the theory of causality is a relatively recent development, so of course 20th-century stats people built models which failed to handle it. Armed with modern tools, it’s entirely plausible that we can handle causality well without having to ground everything in first-principles.
Thanks for your detailed reply. (And sorry I couldn’t format the below well—I don’t seem to get any formatting options in my browser.)
“It is rarely too difficult to specify the true model...this means that “every member of the set of models available to us is false” need not hold”
I agree we could find a true model to explain the economy, climate etc. (presumably the theory of everything in physics). But we don’t have the computational power to make predictions of such systems with that model—so my question is about how should we make predictions when the true model is not practically applicable? By “the set of models available to us”, I meant the models we could actually afford to make predictions with. If the true model is not in that set, then it seems to be that all of these models must be false.
‘”different processes may become important in future” is not actually a problem for Ockham’s razor per se. That’s a problem for causal models’
To take the climate example, say scientists had figured out that there were a biological feedback that kicks in once global warming has gone past 2C (e.g. bacteria become more efficient at decomposing soil and releasing CO2). Suppose you have one model that includes a representation of that feedback (e.g. as a subprocess) and one that does not but is equivalent in every other way (e.g. is coded like the first model but lacks the subprocess). Then isn’t the second model simpler according to metrics like the minimum description length, so that it would be weighted higher if we penalised models using such metrics? But this seems the wrong thing to do, if we think the first model is more likely to give a good prediction.
Now the thought that occurred to me when writing that is that the data the scientists used to deduce the existence of the feedback ought to be accounted for by the models that are used, and this would give low posterior weight to models that don’t include the feedback. But doing this in practice seems hard. Also, it’s not clear to me if there would be a way to tell between models that represent the process but don’t connect it properly to predicting the climate e.g. they have a subprocess that says more CO2 is produced by bacteria at warming higher than 2C, but then don’t actually add this CO2 to the atmosphere, or something.
“likelihoods are never actually zero, they’re just very small”
If our models were deterministic, then if they were not true, wouldn’t it be impossible for them to produce the observed data exactly, so that the likelihood of the data given any of those models would be zero? (Unless there was more than one process that could give rise to the same data, which seems unlikely in practice.) Now if we make the models probabilistic and try to design them such that there is a non-zero chance that the data would be a possible sample from the model, then the likelihood can be non-zero. But it doesn’t seem necessary to do this—models that are false can still give predictions that are useful for decision-making. Also, it’s not clear if we could make a probabilistic model that would have non-zero likelihoods for something as complex as the climate that we could run on our available computers (and that isn’t something obviously of low value for prediction like just giving probability 1/N to each of N days of observed data). So it still seems like it would be valuable to have a principled way of predicting using models that give a zero likelihood of the data.
“the central challenge is to find rigorous approximations of the true underlying models. The main field I know of which studies this sort of problem directly is statistical mechanics, and a number of reasonably-general-purpose tools exist in that field which could potentially be applied in other areas (e.g. this).”
Yes I agree. Thanks for the link—it looks very relevant and I’ll check it out. Edit—I’ll just add, echoing part of my reply to Kenny’s answer, that whilst statistical averaging has got human modellers a certain distance, adding representations of processes whose effects get missed by the averaging seems to add a lot of value (e.g. tropical thunderstorms in the case of climate). So there seems to be something additional to averaging that can be used, to do with coming up with simplified models of processes you can see are missed out by the averaging.
On causality, whilst of course correcting this is desirable, if the models we can afford to compute with can’t reproduce the data, then presumably they are also not reproducing the correct causal graph exactly? And any causal graph we could compute with will not be able to reproduce the data? (Else it would seem that a causal graph could somehow hugely compress the true equations without information loss—great if so!)
Side note: one topic I’ve been reading about recently which is directly relevant to some of your examples (e.g. thunderstorms) is multiscale modelling. You might find it interesting.
Thanks, yes this is very relevant to thinking about climate modelling, with the dominant paradigm being that we can separately model phenomena above and below the resolved scale—there’s an ongoing debate, though, about whether a different approach would work better, and it gets tricky when the resolved scale gets close to the size of important types of weather system.
The trick here is that the data on which the model is trained/fit has to include whatever data the scientists used to learn about that feedback loop in the first place. As long as that data is included, the model which accounts for it will have lower minimum description length. (This fits in with a general theme: the minimum-complexity model is simple and general-purpose; the details are learned from the data.)
… I’m responding as I read. Yup, exactly. As the Bayesians say, we do need to account for all our prior information if we want reliably good results. In practice, this is “hard” in the sense of “it requires significantly more complicated programming”, but not in the sense of “it increases the asymptotic computational complexity”. The programming is more complicated mainly because the code needs to accept several qualitatively different kinds of data, and custom code is likely needed for hooking up each of them. But that’s not a fundamental barrier; it’s still the same computational challenges which make the approach impractical.
Again, we need to include whatever data allowed scientists to connect it to the climate in the first place. (In some cases this is just fundamental physics, in which case it’s already in the model.)
Picture a deterministic model which uses fundamental physics, and models the joint distribution of position and momentum of every atom comprising the Earth. The unknown in this model is the initial conditions—the initial position and momentum of every particle (also particle identity, i.e. which element/isotope each is, but we’ll ignore that). Now, imagine how many of the possible initial conditions are compatible with any particular high-level data we observe. It’s a massive number!
Point is: the deterministic part of a model of a fundamental physical model is the dynamics; the initial conditions are still generally unknown. Conceptually, when we fit the data, we’re mostly looking for initial conditions which match. So zero likelihoods aren’t really an issue; the issue is computing with a joint distribution over position and momentum of so many particles. That’s what statistical mechanics is for.
The corresponding problem in statistical mechanics is to identify the “state variables”—the low-level variables whose averages correspond to macroscopic observables. For instance, the ideal gas law uses density, kinetic energy, and force on container surfaces (whose macroscopic averages correspond to density, temperature, and pressure). Fluid flow, rather than averaging over the whole system, uses density and particle velocity within each little cell of space.
The point: if an effect is “missed by averaging”, that’s usually not inherent to averaging as a technique. The problem is that people average over poorly-chosen features.
Jaynes argued that the key to choosing high-level features is reproducibility: what high-level variables do experimenters need to control in order to get a consistent result distribution? If we consistently get the same results without holding X constant (where X includes e.g. initial conditions of every particle), then apparently X isn’t actually relevant to the result, so we can average out X. Also note that there’s some degrees of freedom in what “results” we’re interested in. For instance, turbulence has macroscopic behavior which depends on low-level initial conditions, but the long-term time average of forces from a turbulent flow usually doesn’t depend on low-level initial conditions—and for engineering purposes, it’s often that time average which we actually care about.
Once we move away from stat mech and approximations of low-level models, yes, this becomes a problem. However, two counterpoints. First, this is the sort of problem where the output says “well, the best model is one with like a gazillion edges, and there’s a bunch that all fit about equally well, so we have no idea what will happen going forward”. That’s unsatisfying, but at least it’s not wrong. Second, if we do get that sort of result, then it probably just isn’t possible to do better with the high-level variables chosen. Going back to reproducibility and selection of high-level variables: if we’ve omitted some high-level variable which really does impact the results we’re interested in, then “we have no idea what will happen going forward” really is the right answer.
Thanks again.
I think I need to think more about the likelihood issue. I still feel like we might be thinking about different things—when you say “a deterministic model which uses fundamental physics”, this would not be in the set of models that we could afford to run to make predictions for complex systems. For the models we could afford to run, it seems to me that no choice of initial conditions would lead them to match the data we observe, except by extreme coincidence (analogous to a simple polynomial just happening to pass through all the datapoints produced by a much more complex function).
I’ve gone through Jaynes’ paper now from the link you gave. His point about deciding what macroscopic variables matter is well-made. But you still need a model of how the macroscopic variables you observe relate to the ones you want to predict. In modelling atmospheric processes, simple spatial averaging of the fluid dynamics equations over resolved spatial scales gets you some way, but then changing the form of the function relating the future to present states (“adding representations of processes” as I put it before) adds additional skill. And Jaynes’ paper doesn’t seem to say how you should choose this function.
Ok, let’s talk about computing with error bars, because it sounds like that’s what’s missing from what you’re picturing.
The usual starting point is linear error—we assume that errors are small enough for linear approximation to be valid. (After this we’ll talk about how to remove that assumption.) We have some multivariate function f(x) - imagine that x is the full state of our simulation at some timestep, and f calculates the state at the next timestep. The value ¯¯¯x of x in our program is really just an estimate of the “true” value x; it has some error Δx=x−¯¯¯x. As a result, the value of ¯¯¯f of f in our program also has some error Δf=f−¯¯¯f. Assuming the error is small enough for linear approximation to hold, we have:
Δf=f−¯¯¯f=f(¯¯¯x+Δx)−f(¯¯¯x)≈(dfdx|¯¯¯x)Δx
where dfdx is the Jacobian, i.e. the matrix of derivatives of every entry of f(x) with respect to every entry of x.
Next, assume that Δx has covariance matrix Σx, and we want to compute the covariance matrix Σf of Δf. We have a linear relationship between Δx and Δf, so we use the usual formula for linear transformation of covariance:
Σf=dfdxTΣxdfdx
Now imagine iterating this at every timestep: we compute the timestep itself, then differentiate that timestep, and matrix multiply our previous uncertainty on both sides by the derivative matrix to get the new uncertainty:
¯¯¯x(t+1)=f(¯¯¯x(t))
Σx(t+1)=(dfdx|¯¯¯x(t))TΣx(t)(dfdx|¯¯¯x(t))
Now, a few key things to note:
For most systems of interest, that uncertainty is going to grow over time, usually exponentially. That’s correct: in a chaotic system, if the initial conditions are uncertain, then of course we should become more and more uncertain about the system’s state over time.
Those formulas only propagate uncertainty in previous state to uncertainty in the next state. Really, there’s also new uncertainty introduced at each timestep, e.g. from error in f itself (i.e. due to averaging) or from whatever’s driving the system. Typically, such errors are introduced as an additive term—i.e. we compute the covariance in x introduced by each source of error, and add them to the propagated covariance matrix at each timestep.
Actually storing the whole covariance matrix would take O(n2) space if x has n elements, which is completely impractical when x is the whole state of a finite element simulation. We make this practical the same way we make all matrix operations practical in numerical computing: exploit sparsity/structure. This is application-specific, but usually the covariance can be well-approximated as the sum of sparse “local” covariances and low-rank “global” covariances.
Likewise with the update: we don’t actually want to compute the n-by-n derivative matrix and then matrix-multiply with the covariance. Most backpropagation libraries expose the derivative as a linear operator rather than an explicit matrix, and we want to use it that way. Again, specifics will vary, depending on the structure of f and of the (approximated) covariance matrix.
In many applications, we have data coming in over time. That data reduces our uncertainty every time it comes in—at that point, we effectively have a Kalman filter. If enough data is available, the uncertainty remains small enough for the linear approximation to continue to hold, and whole thing works great.
If the uncertainty does become too large for linear approximation, then we need to resort to other methods for representing uncertainty, rather than just a covariance matrix. Particle filters are one simple-but-effective fallback, and can be combined with linear uncertainty as well.
In general, if this sounds interesting and you want to know more, it’s covered in a lot of different contexts. I first saw most of it in an autonomous vehicles course; besides robotics, it’s also heavily used in economic models, and sometimes systems/control theory courses will focus on this sort of stuff.
Is this starting to sound like a model for which the observed data would have nonzero probability?
Do you mean you’d be adding the probability distribution with that covariance matrix on top of the mean prediction from f, to make it a probabilistic prediction? I was talking about deterministic predictions before, though my text doesn’t make that clear. For probabilistic models, yes adding an uncertainty distribution may make result in non-zero likelihoods. But if we know the true dynamics are deterministic (pretend there’s no quantum effects, which are largely irrelevant for our prediction errors for systems in the classical physics domain), then we still know the model is not true, and so it seems difficult to interpret p if we were to do Bayesian updating.
Likelihoods are also not obviously (to me) very good measures of model quality for chaotic systems, either—in these cases we know that even if we had the true model, its predictions would diverge from reality due to errors in the initial condition estimates, but it would trace out the correct attractor—and its the attractor geometry (conditional on boundary conditions) that we’d seem to really want to assess. Perhaps then it would have a higher likelihood than every other model, but it’s not obvious to me, and it’s not obvious that there’s not a better metric for leading to good inferences when we don’t have the true model.
Basically the logic that says to use Bayes for deducing the truth does not seem to carry over in an obvious way (to me) to the case when we want to predict but can’t use the true model.
Ah, that’s where we need to apply more Bayes. The underlying system may be deterministic at the macroscopic level, but that does not mean we have perfect knowledge of all the things which effect the system’s trajectory. Most of the uncertainty in e.g. a weather model would not be quantum noise, it would be things like initial conditions, measurement noise (e.g. how close is this measurement to the actual average over this whole volume?), approximation errors (e.g. from discretization of the dynamics), driving conditions (are we accounting for small variations in sunlight or tidal forces?), etc. The true dynamics may be deterministic, but that doesn’t mean that our estimates of all the things which go into those dynamics have no uncertainty. If the inputs have uncertainty (which of course they do), then the outputs also have uncertainty.
The main point of probabilistic models is not to handle “random” behavior in the environment, it’s to quantify uncertainty resulting from our own (lack of) knowledge of the system’s inputs/parameters.
Yeah, you’re pointing to an important issue here, although it’s not actually likelihoods which are the problem—it’s point estimates. In particular, that makes linear approximations a potential issue, since they’re implicitly approximations around a point estimate. Something like a particle filter will do a much better job than a Kalman filter at tracing out an attractor, since it accounts for nonlinearity much better.
Anyway, reasoning with likelihoods and posterior distributions remains valid regardless of whether we’re using point estimates. When the system is chaotic but has an attractor, the posterior probability of the system state will end up smeared pretty evenly over the whole attractor. (Although with enough fine-grained data, we can keep track of roughly where on the attractor the system is at each time, which is why Kalman-type models work well in that case.)
I’m confused. Are you really claiming that modeling the Earth’s climate can be written down “efficiently”? What exactly do you mean by ‘efficiently’? What would a sketch of an efficient description of the “true model space” for the Earth’s climate be?
Extreme answer: just point AIXI at wikipedia. That’s a bit tongue-in-cheek, but it illustrates the concepts well. The actual models (i.e. AIXI) can be very general and compact; rather than AIXI, a specification of low-level physics would be a more realistic model to use for climate. Most of the complexity of the system is then learned from data—i.e. historical weather data, a topo map of the Earth, composition of air/soil/water samples, etc. An exact Bayesian update of a low-level physical model on all that data should be quite sufficient to get a solid climate model; it wouldn’t even take an unrealistic amount of data (data already available online would likely suffice). The problem is that we can’t efficiently compute that update, or efficiently represent the updated model—we’re talking about a joint distribution over positions and momenta of every particle comprising the Earth, and that’s even before we account for quantum. But the prior distribution over positions and momenta of every particle we can represent easily—just use something maxentropic, and the data will be enough to figure out the (relevant parts of the) rest.
So to answer your specific questions:
the “true model space” is just low-level physics
by “efficiently”, I mean the code would be writable by a human and the “training” data would easily fit on your hard drive
Can we reduce the issue of “we can’t efficiently compute that update” by adding sensors?
What if we could get more data ? —— if facing such type of difficulties, I would ask that question first.
Yeah, the usual mechanism by which more data reduces computational difficulty is by directly identifying the values some previously-latent variables. If we know the value of a variable precisely, then that’s easy to represent; the difficult-to-represent distributions are those where there’s a bunch of variables whose uncertainty is large and tightly coupled.
No, he’s referring to something like performing a Bayesian update over all computable hypotheses – that’s incomputable (i.e. even in theory). It’s infinitely beyond the capabilities of even a quantum computer the size of the universe.
Think of it as a kind of (theoretical) ‘upper bound’ on the problem. None of the actual computable (i.e. on real-world computers built by humans) approximations to AIXI are very good in practice.
The AIXI thing was a joke; a Bayesian update on low-level physics with unknown initial conditions would be superexponentially slow, but it certainly isn’t uncomputable. And the distinction does matter—uncomputability usually indicates fundamental barriers even to approximation, whereas superexponential slowness does not (at least in this case).
That’s what I thought you might have meant.
In a sense, existing climate models are already “low-level physics” except that “low-level” means coarse aggregates of climate/weather measurements that are so big that they don’t include tropical cyclones! And, IIRC, those models are so expensive to compute that they can only be computed on supercomputers!
But I’m still confused as to whether you’re claiming that someone could implement AIXI and feed it all the data you mentioned.
You seem to be claiming that “Wikipedia” (or all of the scientific data ever measured) would be enough to generate “the prior distribution over positions and momenta of every particle” and that this data would easily fit on a hard drive. Or are you claiming that such an efficient representation exists in theory? I’m still skeptical of the latter.
This makes me believe that you’re referring to some kind of theoretical algorithm. I understood the asker to wanting something (efficiently) computable, at least relative to actual current climate models (i.e. something requiring no more than supercomputers to use).
That was a joke, but computable approximations of AIXI can certainly be implemented. For instance, a logical inductor run on all that data would be conceptually similar for our purposes.
No, wikipedia or a bunch of scientific data (much less than all the scientific data ever measured), would be enough data to train a solid climate model from a simple prior over particle distributions and momenta. It would definitely not be enough to learn the position and momentum of every particle; a key point of stat mech is that we do not need to learn the position and momentum of every particle in order to make macroscopic predictions. A simple maxentropic prior over microscopic states plus a (relatively) small amount of macroscopic data is enough to make macroscopic predictions.
The code itself need not be theoretical, but it would definitely be superexponentially slow to run. Making it efficient is where stat mech, multiscale modelling, etc come in. The point I want to make is that the system’s “complexity” is not a fundamental barrier requiring fundamentally different epistemic principles.
That’s clearer to me, but I’m still skeptical that that’s in fact possible. I don’t understand how the prior can be considered “over particle distributions and momenta”, except via the theories and models of statistical mechanics, i.e. assuming that those microscopic details can be ignored.
I agree with this. But I think you’re eliding how much work is involved in what you described as:
I wouldn’t think that standard statistical mechanics would be sufficient for modeling the Earth’s climate. I’d expect fluid dynamics is also important as well as chemistry, geology, the dynamics of the Sun, etc.. It’s not obvious to me that statistical mechanics would be effective alone in practice.
Ah… I’m talking about stat mech in a broader sense than I think you’re imagining. The central problem of the field is the “bridge laws” defining/expressing macroscopic behavior in terms of microscopic behavior. So, e.g., deriving Navier-Stokes from molecular dynamics is a stat mech problem. Of course we still need the other sciences (chemistry, geology, etc) to define the system in the first place. The point of stat mech is to take low-level laws with lots of degrees of freedom, and derive macroscopic laws from them. For very coarse, high-level models, the “low-level model” might itself be e.g. fluid dynamics.
Yeah, this stuff definitely isn’t easy. As you argued above, the general case of the problem is basically AGI (and also the topic of my own research). But there are a lot of existing tricks and the occasional reasonably-general-tool, especially in the multiscale modelling world and in Bayesian stat mech.
Yes, I don’t think we really disagree. My prior (prior to this extended comments discussion) was that there are lots of wonderful existing tricks, but there’s no real shortcut for the fully general problem and any such shortcut would be effectively AGI anyways.
Just as as aside, a typical modern climate model will simulate tropical cyclones as emergent phenomena from the coarse-scale fluid dynamics, albeit not enough of the most intense ones. Though, much smaller tropical thunderstorm-like systems are much more crudely represented.
Tangential, but now I’m curious… do you know what discretization methods are typically used for the fluid dynamics? I ask because insufficiently-intense cyclones sound like exactly the sort of thing APIC methods were made to fix, but those are relatively recent and I don’t have a sense for how much adoption they’ve had outside of graphics.
There’s a mixture—finite differencing used to be used a lot but seems to be less common now, semi-Lagrangian advection seems to have taken over from that in models that used it, then some work by doing most of the computations in spectral space and neglecting the smallest spatial scales. Recently newer methods have been developed to work better on massively parallel computers. It’s not my area, though, so I can’t give a very expert answer—but I’m pretty sure the people working on it think hard about trying to not smooth out intense structures (though, that has to be balanced against maintaining numerical stability).
How much are ‘graphical’ methods like APIC incorporated elsewhere in general?
My intuition has certainly been pumped to the effect that models that mimic visual behavior are likely to be useful more generally, but maybe that’s not a widely shared intuition.
I would have hoped that was the case, but that’s interesting that both large and small ones are apparently not so easily emergent.
I wonder whether the models are so coarse that the cyclones that do emerge are in a sense the minimum size. That would readily explain the lack of smaller emergent cyclones. Maybe larger ones don’t emerge because the ‘next larger size’ is too big for the models. I’d think ‘scaling’ of eddies in fluids might be informative: What’s the smallest eddy possibly in some fluid? What other eddy sizes are observed (or can be modeled)?
Not sure if this was intended to be rhetorical, but a big part of what makes turbulence difficult is that we see eddies at many scales, including very small eddies (at least down to the scale that Navier-Stokes holds). I remember a striking graphic about the onset of turbulence in a pot of boiling water, in which the eddies repeatedly halve in size as certain parameter cutoffs are passed, and the number of eddies eventually diverges—that’s the onset of turbulence.
Sorry for being unclear – it was definitely not intended to be rhetorical!
Yes, turbulence was exactly what I was thinking about. At some small enough scale, we probably wouldn’t expect to ‘find’ or be able to distinguish eddies. So there’s probably some minimum size. But then is there any pattern or structure to the larger sizes of eddies? For (an almost certainly incorrect) example, maybe all eddies are always a multiple of the minimum size and the multiple is always an integer power of two. Or maybe there is no such ‘discrete quantization’ of eddy sizes, tho eddies always ‘split’ into nested halves (under certain conditions).
It certainly seems the case tho that eddies aren’t possible as emergent phenomena at a scale smaller than the discretization of the approximation itself.
It’s not my area, but I don’t think that’s the case. My impression is that part of what drives very high wind speeds in the strongest hurricanes is convection on the scale of a few km in the eyewall, so models with that sort of spatial resolution can generate realistically strong systems, but that’s ~20x finer than typical climate model resolutions at the moment, so it will be a while before we can simulate those systems routinely (though, some argue we could do it if we had a computer costing a few billion dollars).
Thanks! That’s very interesting to me.
It seems like it might be an example of relatively small structures having potentially arbitrarily large long-term effects on the state of the entire system.
It could be the case tho that the overall effects of cyclones are still statistical at the scale of the entire planet’s climate.
Regardless, it’s a great example of the kind of thing for which we don’t yet have good general learning algorithms.