So when we say it is “too costly to model from first principles”, we should keep in mind that we don’t mean the true model space can’t even be written down efficiently.
I’m confused. Are you really claiming that modeling the Earth’s climate can be written down “efficiently”? What exactly do you mean by ‘efficiently’? What would a sketch of an efficient description of the “true model space” for the Earth’s climate be?
Extreme answer: just point AIXI at wikipedia. That’s a bit tongue-in-cheek, but it illustrates the concepts well. The actual models (i.e. AIXI) can be very general and compact; rather than AIXI, a specification of low-level physics would be a more realistic model to use for climate. Most of the complexity of the system is then learned from data—i.e. historical weather data, a topo map of the Earth, composition of air/soil/water samples, etc. An exact Bayesian update of a low-level physical model on all that data should be quite sufficient to get a solid climate model; it wouldn’t even take an unrealistic amount of data (data already available online would likely suffice). The problem is that we can’t efficiently compute that update, or efficiently represent the updated model—we’re talking about a joint distribution over positions and momenta of every particle comprising the Earth, and that’s even before we account for quantum. But the prior distribution over positions and momenta of every particle we can represent easily—just use something maxentropic, and the data will be enough to figure out the (relevant parts of the) rest.
So to answer your specific questions:
the “true model space” is just low-level physics
by “efficiently”, I mean the code would be writable by a human and the “training” data would easily fit on your hard drive
Yeah, the usual mechanism by which more data reduces computational difficulty is by directly identifying the values some previously-latent variables. If we know the value of a variable precisely, then that’s easy to represent; the difficult-to-represent distributions are those where there’s a bunch of variables whose uncertainty is large and tightly coupled.
Think of it as a kind of (theoretical) ‘upper bound’ on the problem. None of the actual computable (i.e. on real-world computers built by humans) approximations to AIXI are very good in practice.
The AIXI thing was a joke; a Bayesian update on low-level physics with unknown initial conditions would be superexponentially slow, but it certainly isn’t uncomputable. And the distinction does matter—uncomputability usually indicates fundamental barriers even to approximation, whereas superexponential slowness does not (at least in this case).
In a sense, existing climate models are already “low-level physics” except that “low-level” means coarse aggregates of climate/weather measurements that are so big that they don’t include tropical cyclones! And, IIRC, those models are so expensive to compute that they can only be computed on supercomputers!
But I’m still confused as to whether you’re claiming that someone could implement AIXI and feed it all the data you mentioned.
the prior distribution over positions and momenta of every particle we can represent easily—just use something maxentropic, and the data will be enough to figure out the (relevant parts of the) rest.
You seem to be claiming that “Wikipedia” (or all of the scientific data ever measured) would be enough to generate “the prior distribution over positions and momenta of every particle” and that this data would easily fit on a hard drive. Or are you claiming that such an efficient representation exists in theory? I’m still skeptical of the latter.
The problem is that we can’t efficiently compute that update, or efficiently represent the updated model—we’re talking about a joint distribution over positions and momenta of every particle comprising the Earth, and that’s even before we account for quantum.
This makes me believe that you’re referring to some kind of theoretical algorithm. I understood the asker to wanting something (efficiently) computable, at least relative to actual current climate models (i.e. something requiring no more than supercomputers to use).
But I’m still confused as to whether you’re claiming that someone could implement AIXI and feed it all the data you mentioned.
That was a joke, but computable approximations of AIXI can certainly be implemented. For instance, a logical inductor run on all that data would be conceptually similar for our purposes.
You seem to be claiming that “Wikipedia” (or all of the scientific data ever measured) would be enough to generate “the prior distribution over positions and momenta of every particle” and that this data would easily fit on a hard drive.
No, wikipedia or a bunch of scientific data (much less than all the scientific data ever measured), would be enough data to train a solid climate model from a simple prior over particle distributions and momenta. It would definitely not be enough to learn the position and momentum of every particle; a key point of stat mech is that we do not need to learn the position and momentum of every particle in order to make macroscopic predictions. A simple maxentropic prior over microscopic states plus a (relatively) small amount of macroscopic data is enough to make macroscopic predictions.
This makes me believe that you’re referring to some kind of theoretical algorithm.
The code itself need not be theoretical, but it would definitely be superexponentially slow to run. Making it efficient is where stat mech, multiscale modelling, etc come in. The point I want to make is that the system’s “complexity” is not a fundamental barrier requiring fundamentally different epistemic principles.
… wikipedia or a bunch of scientific data (much less than all the scientific data ever measured), would be enough data to train a solid climate model from a simple prior over particle distributions and momenta. It would definitely not be enough to learn the position and momentum of every particle; a key point of stat mech is that we do not need to learn the position and momentum of every particle in order to make macroscopic predictions. A simple maxentropic prior over microscopic states plus a (relatively) small amount of macroscopic data is enough to make macroscopic predictions.
That’s clearer to me, but I’m still skeptical that that’s in fact possible. I don’t understand how the prior can be considered “over particle distributions and momenta”, except via the theories and models of statistical mechanics, i.e. assuming that those microscopic details can be ignored.
The point I want to make is that the system’s “complexity” is not a fundamental barrier requiring fundamentally different epistemic principles.
I agree with this. But I think you’re eliding how much work is involved in what you described as:
Making it efficient is where stat mech, multiscale modelling, etc come in.
I wouldn’t think that standard statistical mechanics would be sufficient for modeling the Earth’s climate. I’d expect fluid dynamics is also important as well as chemistry, geology, the dynamics of the Sun, etc.. It’s not obvious to me that statistical mechanics would be effective alone in practice.
Ah… I’m talking about stat mech in a broader sense than I think you’re imagining. The central problem of the field is the “bridge laws” defining/expressing macroscopic behavior in terms of microscopic behavior. So, e.g., deriving Navier-Stokes from molecular dynamics is a stat mech problem. Of course we still need the other sciences (chemistry, geology, etc) to define the system in the first place. The point of stat mech is to take low-level laws with lots of degrees of freedom, and derive macroscopic laws from them. For very coarse, high-level models, the “low-level model” might itself be e.g. fluid dynamics.
I think you’re eliding how much work is involved in what you described as...
Yeah, this stuff definitely isn’t easy. As you argued above, the general case of the problem is basically AGI (and also the topic of my own research). But there are a lot of existing tricks and the occasional reasonably-general-tool, especially in the multiscale modelling world and in Bayesian stat mech.
Yes, I don’t think we really disagree. My prior (prior to this extended comments discussion) was that there are lots of wonderful existing tricks, but there’s no real shortcut for the fully general problem and any such shortcut would be effectively AGI anyways.
climate models are already “low-level physics” except that “low-level” means coarse aggregates of climate/weather measurements that are so big that they don’t include tropical cyclones!
Just as as aside, a typical modern climate model will simulate tropical cyclones as emergent phenomena from the coarse-scale fluid dynamics, albeit not enough of the most intense ones. Though, much smaller tropical thunderstorm-like systems are much more crudely represented.
Tangential, but now I’m curious… do you know what discretization methods are typically used for the fluid dynamics? I ask because insufficiently-intense cyclones sound like exactly the sort of thing APIC methods were made to fix, but those are relatively recent and I don’t have a sense for how much adoption they’ve had outside of graphics.
do you know what discretization methods are typically used for the fluid dynamics?
There’s a mixture—finite differencing used to be used a lot but seems to be less common now, semi-Lagrangian advection seems to have taken over from that in models that used it, then some work by doing most of the computations in spectral space and neglecting the smallest spatial scales. Recently newer methods have been developed to work better on massively parallel computers. It’s not my area, though, so I can’t give a very expert answer—but I’m pretty sure the people working on it think hard about trying to not smooth out intense structures (though, that has to be balanced against maintaining numerical stability).
How much are ‘graphical’ methods like APIC incorporated elsewhere in general?
My intuition has certainly been pumped to the effect that models that mimic visual behavior are likely to be useful more generally, but maybe that’s not a widely shared intuition.
I would have hoped that was the case, but that’s interesting that both large and small ones are apparently not so easily emergent.
I wonder whether the models are so coarse that the cyclones that do emerge are in a sense the minimum size. That would readily explain the lack of smaller emergent cyclones. Maybe larger ones don’t emerge because the ‘next larger size’ is too big for the models. I’d think ‘scaling’ of eddies in fluids might be informative: What’s the smallest eddy possibly in some fluid? What other eddy sizes are observed (or can be modeled)?
Not sure if this was intended to be rhetorical, but a big part of what makes turbulence difficult is that we see eddies at many scales, including very small eddies (at least down to the scale that Navier-Stokes holds). I remember a striking graphic about the onset of turbulence in a pot of boiling water, in which the eddies repeatedly halve in size as certain parameter cutoffs are passed, and the number of eddies eventually diverges—that’s the onset of turbulence.
Sorry for being unclear – it was definitely not intended to be rhetorical!
Yes, turbulence was exactly what I was thinking about. At some small enough scale, we probably wouldn’t expect to ‘find’ or be able to distinguish eddies. So there’s probably some minimum size. But then is there any pattern or structure to the larger sizes of eddies? For (an almost certainly incorrect) example, maybe all eddies are always a multiple of the minimum size and the multiple is always an integer power of two. Or maybe there is no such ‘discrete quantization’ of eddy sizes, tho eddies always ‘split’ into nested halves (under certain conditions).
It certainly seems the case tho that eddies aren’t possible as emergent phenomena at a scale smaller than the discretization of the approximation itself.
I wonder whether the models are so coarse that the cyclones that do emerge are in a sense the minimum size.
It’s not my area, but I don’t think that’s the case. My impression is that part of what drives very high wind speeds in the strongest hurricanes is convection on the scale of a few km in the eyewall, so models with that sort of spatial resolution can generate realistically strong systems, but that’s ~20x finer than typical climate model resolutions at the moment, so it will be a while before we can simulate those systems routinely (though, some argue we could do it if we had a computer costing a few billion dollars).
It seems like it might be an example of relatively small structures having potentially arbitrarily large long-term effects on the state of the entire system.
It could be the case tho that the overall effects of cyclones are still statistical at the scale of the entire planet’s climate.
Regardless, it’s a great example of the kind of thing for which we don’t yet have good general learning algorithms.
I’m confused. Are you really claiming that modeling the Earth’s climate can be written down “efficiently”? What exactly do you mean by ‘efficiently’? What would a sketch of an efficient description of the “true model space” for the Earth’s climate be?
Extreme answer: just point AIXI at wikipedia. That’s a bit tongue-in-cheek, but it illustrates the concepts well. The actual models (i.e. AIXI) can be very general and compact; rather than AIXI, a specification of low-level physics would be a more realistic model to use for climate. Most of the complexity of the system is then learned from data—i.e. historical weather data, a topo map of the Earth, composition of air/soil/water samples, etc. An exact Bayesian update of a low-level physical model on all that data should be quite sufficient to get a solid climate model; it wouldn’t even take an unrealistic amount of data (data already available online would likely suffice). The problem is that we can’t efficiently compute that update, or efficiently represent the updated model—we’re talking about a joint distribution over positions and momenta of every particle comprising the Earth, and that’s even before we account for quantum. But the prior distribution over positions and momenta of every particle we can represent easily—just use something maxentropic, and the data will be enough to figure out the (relevant parts of the) rest.
So to answer your specific questions:
the “true model space” is just low-level physics
by “efficiently”, I mean the code would be writable by a human and the “training” data would easily fit on your hard drive
Can we reduce the issue of “we can’t efficiently compute that update” by adding sensors?
What if we could get more data ? —— if facing such type of difficulties, I would ask that question first.
Yeah, the usual mechanism by which more data reduces computational difficulty is by directly identifying the values some previously-latent variables. If we know the value of a variable precisely, then that’s easy to represent; the difficult-to-represent distributions are those where there’s a bunch of variables whose uncertainty is large and tightly coupled.
No, he’s referring to something like performing a Bayesian update over all computable hypotheses – that’s incomputable (i.e. even in theory). It’s infinitely beyond the capabilities of even a quantum computer the size of the universe.
Think of it as a kind of (theoretical) ‘upper bound’ on the problem. None of the actual computable (i.e. on real-world computers built by humans) approximations to AIXI are very good in practice.
The AIXI thing was a joke; a Bayesian update on low-level physics with unknown initial conditions would be superexponentially slow, but it certainly isn’t uncomputable. And the distinction does matter—uncomputability usually indicates fundamental barriers even to approximation, whereas superexponential slowness does not (at least in this case).
That’s what I thought you might have meant.
In a sense, existing climate models are already “low-level physics” except that “low-level” means coarse aggregates of climate/weather measurements that are so big that they don’t include tropical cyclones! And, IIRC, those models are so expensive to compute that they can only be computed on supercomputers!
But I’m still confused as to whether you’re claiming that someone could implement AIXI and feed it all the data you mentioned.
You seem to be claiming that “Wikipedia” (or all of the scientific data ever measured) would be enough to generate “the prior distribution over positions and momenta of every particle” and that this data would easily fit on a hard drive. Or are you claiming that such an efficient representation exists in theory? I’m still skeptical of the latter.
This makes me believe that you’re referring to some kind of theoretical algorithm. I understood the asker to wanting something (efficiently) computable, at least relative to actual current climate models (i.e. something requiring no more than supercomputers to use).
That was a joke, but computable approximations of AIXI can certainly be implemented. For instance, a logical inductor run on all that data would be conceptually similar for our purposes.
No, wikipedia or a bunch of scientific data (much less than all the scientific data ever measured), would be enough data to train a solid climate model from a simple prior over particle distributions and momenta. It would definitely not be enough to learn the position and momentum of every particle; a key point of stat mech is that we do not need to learn the position and momentum of every particle in order to make macroscopic predictions. A simple maxentropic prior over microscopic states plus a (relatively) small amount of macroscopic data is enough to make macroscopic predictions.
The code itself need not be theoretical, but it would definitely be superexponentially slow to run. Making it efficient is where stat mech, multiscale modelling, etc come in. The point I want to make is that the system’s “complexity” is not a fundamental barrier requiring fundamentally different epistemic principles.
That’s clearer to me, but I’m still skeptical that that’s in fact possible. I don’t understand how the prior can be considered “over particle distributions and momenta”, except via the theories and models of statistical mechanics, i.e. assuming that those microscopic details can be ignored.
I agree with this. But I think you’re eliding how much work is involved in what you described as:
I wouldn’t think that standard statistical mechanics would be sufficient for modeling the Earth’s climate. I’d expect fluid dynamics is also important as well as chemistry, geology, the dynamics of the Sun, etc.. It’s not obvious to me that statistical mechanics would be effective alone in practice.
Ah… I’m talking about stat mech in a broader sense than I think you’re imagining. The central problem of the field is the “bridge laws” defining/expressing macroscopic behavior in terms of microscopic behavior. So, e.g., deriving Navier-Stokes from molecular dynamics is a stat mech problem. Of course we still need the other sciences (chemistry, geology, etc) to define the system in the first place. The point of stat mech is to take low-level laws with lots of degrees of freedom, and derive macroscopic laws from them. For very coarse, high-level models, the “low-level model” might itself be e.g. fluid dynamics.
Yeah, this stuff definitely isn’t easy. As you argued above, the general case of the problem is basically AGI (and also the topic of my own research). But there are a lot of existing tricks and the occasional reasonably-general-tool, especially in the multiscale modelling world and in Bayesian stat mech.
Yes, I don’t think we really disagree. My prior (prior to this extended comments discussion) was that there are lots of wonderful existing tricks, but there’s no real shortcut for the fully general problem and any such shortcut would be effectively AGI anyways.
Just as as aside, a typical modern climate model will simulate tropical cyclones as emergent phenomena from the coarse-scale fluid dynamics, albeit not enough of the most intense ones. Though, much smaller tropical thunderstorm-like systems are much more crudely represented.
Tangential, but now I’m curious… do you know what discretization methods are typically used for the fluid dynamics? I ask because insufficiently-intense cyclones sound like exactly the sort of thing APIC methods were made to fix, but those are relatively recent and I don’t have a sense for how much adoption they’ve had outside of graphics.
There’s a mixture—finite differencing used to be used a lot but seems to be less common now, semi-Lagrangian advection seems to have taken over from that in models that used it, then some work by doing most of the computations in spectral space and neglecting the smallest spatial scales. Recently newer methods have been developed to work better on massively parallel computers. It’s not my area, though, so I can’t give a very expert answer—but I’m pretty sure the people working on it think hard about trying to not smooth out intense structures (though, that has to be balanced against maintaining numerical stability).
How much are ‘graphical’ methods like APIC incorporated elsewhere in general?
My intuition has certainly been pumped to the effect that models that mimic visual behavior are likely to be useful more generally, but maybe that’s not a widely shared intuition.
I would have hoped that was the case, but that’s interesting that both large and small ones are apparently not so easily emergent.
I wonder whether the models are so coarse that the cyclones that do emerge are in a sense the minimum size. That would readily explain the lack of smaller emergent cyclones. Maybe larger ones don’t emerge because the ‘next larger size’ is too big for the models. I’d think ‘scaling’ of eddies in fluids might be informative: What’s the smallest eddy possibly in some fluid? What other eddy sizes are observed (or can be modeled)?
Not sure if this was intended to be rhetorical, but a big part of what makes turbulence difficult is that we see eddies at many scales, including very small eddies (at least down to the scale that Navier-Stokes holds). I remember a striking graphic about the onset of turbulence in a pot of boiling water, in which the eddies repeatedly halve in size as certain parameter cutoffs are passed, and the number of eddies eventually diverges—that’s the onset of turbulence.
Sorry for being unclear – it was definitely not intended to be rhetorical!
Yes, turbulence was exactly what I was thinking about. At some small enough scale, we probably wouldn’t expect to ‘find’ or be able to distinguish eddies. So there’s probably some minimum size. But then is there any pattern or structure to the larger sizes of eddies? For (an almost certainly incorrect) example, maybe all eddies are always a multiple of the minimum size and the multiple is always an integer power of two. Or maybe there is no such ‘discrete quantization’ of eddy sizes, tho eddies always ‘split’ into nested halves (under certain conditions).
It certainly seems the case tho that eddies aren’t possible as emergent phenomena at a scale smaller than the discretization of the approximation itself.
It’s not my area, but I don’t think that’s the case. My impression is that part of what drives very high wind speeds in the strongest hurricanes is convection on the scale of a few km in the eyewall, so models with that sort of spatial resolution can generate realistically strong systems, but that’s ~20x finer than typical climate model resolutions at the moment, so it will be a while before we can simulate those systems routinely (though, some argue we could do it if we had a computer costing a few billion dollars).
Thanks! That’s very interesting to me.
It seems like it might be an example of relatively small structures having potentially arbitrarily large long-term effects on the state of the entire system.
It could be the case tho that the overall effects of cyclones are still statistical at the scale of the entire planet’s climate.
Regardless, it’s a great example of the kind of thing for which we don’t yet have good general learning algorithms.