The particular billiards table we’ll use is one we dug out of the physicists’ supply closet, nestled in between a spherical cow and the edge of an infinite plane. The billiard balls are all frictionless, perfectly spherical, bounce perfectly elastically off of other balls and the edges of the table, etc.
Fun fact about billiard balls: if our aim has a tiny bit of error to it, and we hit a ball at ever-so-slightly the wrong angle, that error will grow exponentially as the balls collide. Picture it like this: we start with an evenly-spaced line of balls on the table.
We try to shoot straight along the line, but the angle is off by a tiny amount, call it Δθ..
The ball rolls forward, and hits the next ball in line. The distance by which it’s off is roughly the ball-spacing length L multiplied by Δθ, so LΔθ.
Since the first ball hits the second ball off-center, the second ball will also have some error in its angle. We do a little geometry, and find that the angular error in the second ball is roughly L2RΔθ, where R is the radius of a ball.
Now the second ball rolls into the third. The math is exactly the same as before, except the initial error is now multiplied by a factor L2R. So when the second ball hits the third, the angular error in the third ball will be multiplied again, yielding error (L2R)2Δθ. And so forth.
Upshot of all this: in a billiard-ball system, small angular uncertainty grows exponentially with the number of collisions.
In fact, this simplified head-on collision scenario yields the slowest exponential growth. If the balls are hitting at random angles, then the uncertainty grows even faster.
This is a prototypical example of mathematical chaos: small errors grow exponentially as the system evolves over time. Given even a tiny amount of uncertainty in the initial conditions (or a tiny amount of noise from air molecules, or a tiny amount of noise from an uneven table surface, or …), the uncertainty grows, so we are unable to precisely forecast ball-positions far in the future. As we forecast further into the future, our predictions become highly uncertain.
An Information-Theoretic Point Of View
At first glance, chaos poses a small puzzle. The dynamics of billiard-balls are reversible: if we exactly know all the ball positions and velocities at one time, we can exactly calculate all the ball positions and velocities at an earlier or later time. In that sense, no information is lost. Why, then, do our predictions become highly uncertain?
The key is that angles and positions and velocities are real numbers, and specifying a real number takes an infinite number of digits—e.g. the angle of the first ball in our line of billiards balls might be θ = 0.0013617356590430716…. Even though it’s a small real number, it still has an infinite number of digits. And as the billiards system evolves, digits further and further back in the decimal expansion become relevant to the large-scale system behavior. But the digits far back in the decimal expansion are exactly the digits about which we have approximately-zero information, so as time runs forward, we become highly uncertain about the system state.
Conversely, our initial information about the large-scale system behavior still tells us a lot about the future state, but most of what it tells us is about digits far back in the decimal expansion of the future state variables (i.e. positions and velocities). Another way to put it: initially we have very precise information about the leading-order digits, but near-zero information about the lower-order digits further back. As the system evolves, these mix together. We end up with a lot of information about the leading-order and lower-order digits combined, but very little information about either one individually. (Classic example of how we can have lots of information about two variables combined but little information about either individually: I flip two coins in secret, then tell you that the two outcomes were the same. All the information is about the relationship between the two variables, not about the individual values.) So, even though we have a lot of information about the microscopic system state, our predictions about large-scale behavior (i.e. the leading-order digits) are highly uncertain.
Conserved Quantities
In our billiards example, not quite all large-scale information is lost to chaos. Assuming no friction and perfectly elastic collisions (remember, we dug this billiard table out of the physicists’ supply closet), energy is perfectly conserved over time. If we know the initial energy, we know the energy later on. If we approximately know the initial energy—the first two digits of the initial energy measured in Joules, for instance—then we approximately know the energy later on. That particular large-scale information is not lost to chaos; our uncertainty about the system’s energy does not grow over time.
More generally: in chaotic systems, there are typically some conserved quantities. Usually, as we predict further and further into the future, our (large-scale) predictions become maximally uncertain except for the conserved quantities.
Here are four other quantities conserved over time in our billiards system:
Number of balls
Area of the table
Number of balls of each color
Color of the table
In the context of the billiards table, these four quantities probably seem “less interesting” than energy, at least at first glance. That’s because the “variables” we were thinking about were mainly positions and velocities of the balls, and quantities like number of balls, area of the table, etc, do not depend on the positions and velocities of the balls. Of course the color of the table stays the same as the balls’ positions and velocities change, so of course our uncertainty about the color of the table stays the same as we become more uncertain about the balls’ positions and velocities.
… but the billiards table is usually used as a simplification of a more widely applicable model: an ideal gas. In the context of gasses, we often do things like add or remove gas from a container (i.e. change the number of balls), compress or expand gas in a piston (i.e. change the area of the table), or induce chemical reactions (i.e. change number of balls of each color). When we make those sorts of changes, things like number of balls and area of table do start to interact with the positions and velocities. And those quantities are still conserved by the balls’ dynamics (even if we’re changing them externally), so information about them won’t be wiped out by chaos.
An Interesting And Powerful Observation
Suppose we start off our billiards table with 10 balls at some average initial energy. (For the large majority of possible starting states, it doesn’t matter how that initial energy is distributed, since chaos will wipe out the details quickly anyway.) We let it bounce around a while, then add another 10 balls at rest, then adjust the table to half of its previous size. We want to predict the average energy of the balls at the end.
What’s interesting is that we can predict the average energy of the balls at the end, reasonably-accurately and robustly, just from the change in number of balls and table size. The average energy of the balls at the end is approximately-independent of the initial positions and velocities of all the individual balls; it only depends on initial energy, initial ball-number and table-area, the changes in ball-number and table-area, and the average energy of the added balls.
To put it in terms of gasses: empirically, we can reasonably-precisely and robustly predict the final temperature of a gas in an insulated piston from its initial temperature, particle-count and piston-volume, the change in particle-count and piston-volume, and the average energy of the added gas.
In other words: in billiards/gasses, we find that
We only need information about quantities conserved by the system’s dynamics to predict such conserved quantities later on, even when we’re externally changing those quantities.
Other than the quantities conserved by the system’s dynamics, all other information is wiped out by chaos over time.
Putting those two together: if we have even just a little initial uncertainty, then everything which we can predict about the system reasonably far into the future can be predicted using only knowledge of the quantities conserved by the dynamics, and how those quantities change.
That’s a remarkably powerful feature. Why does it happen, and how does it generalize?
At a very rough intuitive level, we can argue:
Chaos quickly wipes out all information except for quantities conserved by the system’s dynamics.
Therefore, insofar as the final state is predictable at all, everything predictable about it is predictable from just the conserved quantities.
With that argument in mind, we can see that gasses are at least a little bit special: we can predict the final energy/temperature precisely from conserved quantities of gasses, which is not always true for all systems. In general, sometimes the final energy/temperature (or final value of some other conserved quantity) does depend on the details of the initial conditions. For instance, suppose we set up a tiny pressure pad at one spot on the billiards table, and arrange to release a bunch more balls if the pad is hit. Then we need to know whether the pad is hit in order to precisely predict the final energy, and that in turn depends on the details of the initial conditions. It’s not precisely predictable just from the conserved quantities.
But, even with the pressure pad, we can make a slightly weaker claim: insofar as the final energy is predictable at all given a little bit of uncertainty in initial conditions, it’s predictable from just the conserved quantities. In other words, given just the conserved quantities, we can get a distribution over the final energy (based on the probability of the pressure pad being hit), and if we have even just a little uncertainty in the initial conditions, then we cannot predict any better than that distribution.
That’s the main interesting claim to generalize: insofar as we can predict reasonably-far-future times at all, given a little uncertainty in initial conditions, we cannot do any better than predictions based only on the quantities conserved by the system’s dynamics.
Chaos is generally framed in the language of dynamic systems: some state x, which varies as a function of time x(t), according to some update rule
x(t+dt)=f(x(t))
Unfortunately this setup is not particularly well suited to a lot of the interesting problems to which the intuitive stories of chaos seem potentially applicable.
Minor Generalization: Drop Time-Symmetry
A first natural generalization is to allow the update rule to change over time:
x(t+dt)=f(x(t),t)
I’m sure plenty of mathematicians have worked on this sort of thing before, but I haven’t looked into it much. The core problems, as I see them, are:
Say useful things about the rate of loss of information about high-order bits
Characterize the information not lost, i.e. generalizations of conserved quantities
Find tractable algorithms for predicting (partial) state at one time from (partial) state at other times, via the generalizations of conserved quantities
More Significant Generalization: Drop Synchronous Time Altogether
In practice, we want to make predictions about things which are reasonably far away, not just in time, but in space. We want to take information about one little chunk of the universe, and use it to make predictions about some other little chunk, both somewhere else and somewhen else.
We could try to shoehorn this into a dynamic system model. Let x be the state of the whole spatially-distributed universe, back out whatever information we can get about initial conditions from looking at our one little chunk, then run dynamics forward again to make predictions about another little chunk. But it’s not really a natural language for talking about the problem. Dynamical systems just aren’t a very good language for talking about things separated by space in general; distributed systems don’t play well with a synchronous clock.
A bare-minimum model which handles both space and time well is a circuit (a.k.a. Bayes net, a.k.a. causal model, a.k.a. local equations respecting causality). In that language, rather than information propagating over “time”, it’s natural to think about information propagating through many layers of cuts through the circuit (a.k.a. Markov blankets):
Again, the core problems are:
Say useful things about the rate of loss of information about high-order bits across a sequence of cuts/Markov blankets
Characterize the information not lost, i.e. generalizations of conserved quantities across cuts/Markov blankets
Find tractable algorithms for predicting (partial) state of one chunk of the world from (partial) states of other chunks, via the generalizations of conserved quantities
One additional problem, which is much less natural in the dynamical system formulation:
Is there a useful “local” version of information-conservation, i.e. one which looks at the “dynamics” of just a chunk rather than a whole time-slice or other Markov blanket through the circuit? If so, how does “local” information-conservation relate to “global” information-conservation, i.e. conservation across time-slices or other Markov blankets?
This is what a lot of my own research has been about the last couple years, but I’m still not fully satisfied with the results.
Even More General: Drop Graphical Structure
Finally: how might we drop graphical structure more generally? Chaos provides a general story that:
All information except “conserved quantities” is “quickly” “wiped out”
Therefore, insofar as reasonably-”later” “states” are predictable at all, everything predictable about them is predictable just from the “conserved quantities”
Intuitively, it seems like there’s a story here which generalizes to all large prediction problems, or at least a very wide space of them.
An example: Suppose we have a binary function f, with a million input bits and one output bit. The function is uniformly randomly chosen from all such functions—i.e. for each of the 2^1000000 possible inputs x, we flipped a coin to determine the output f(x) for that particular input. Now, suppose we know f, and we know all but 50 of the input bits—i.e. we know 999950 of the input bits. How much information do we have about the output?
Answer: almost none. For almost all such functions, knowing 999950 input bits gives us ∼1/2^50 bits of information about the output. More generally, if the function has n
input bits and we know all but k, then we have ~ (12)k bits of information about the output. Our information drops off exponentially with the number of unknown bits. (Rough argument here.)
Intuitively, what’s going on here? Well, random functions are very sensitive to all the input bits; flipping one input bit changes the value of the function about half the time. So, if we’re uncertain about just a few of the input bits, that uncertainty wipes out our information about the output exponentially quickly. Compare to the story in chaotic systems: in a chaotic system, the final state after a reasonable time is very sensitive to lots of the bits of the initial state; flipping one relatively-low-order initial state bit typically changes the large-scale later state. So, if we’re uncertain about just a few of the initial bits (i.e. the low-order bits), that uncertainty wipes out our information about the final state exponentially quickly.
To properly generalize the ideas of chaos to this sort of setup, the core problems are again:
Say useful things about the rate of loss of information with respect to the number of unknown bits
Characterize the information lost, i.e. generalizations of conserved quantities
Find tractable algorithms for predicting some stuff given some other stuff, via the generalizations of conserved quantities.
One additional problem:
What’s the right language in which to talk about this phenomenon, in general? It seems to go beyond both dynamical systems and circuits, so what’s the right setting?
A Primer On Chaos
Consider a billiards table:
The particular billiards table we’ll use is one we dug out of the physicists’ supply closet, nestled in between a spherical cow and the edge of an infinite plane. The billiard balls are all frictionless, perfectly spherical, bounce perfectly elastically off of other balls and the edges of the table, etc.
Fun fact about billiard balls: if our aim has a tiny bit of error to it, and we hit a ball at ever-so-slightly the wrong angle, that error will grow exponentially as the balls collide. Picture it like this: we start with an evenly-spaced line of balls on the table.
We try to shoot straight along the line, but the angle is off by a tiny amount, call it Δθ..
The ball rolls forward, and hits the next ball in line. The distance by which it’s off is roughly the ball-spacing length L multiplied by Δθ, so LΔθ.
Since the first ball hits the second ball off-center, the second ball will also have some error in its angle. We do a little geometry, and find that the angular error in the second ball is roughly L2RΔθ, where R is the radius of a ball.
Now the second ball rolls into the third. The math is exactly the same as before, except the initial error is now multiplied by a factor L2R. So when the second ball hits the third, the angular error in the third ball will be multiplied again, yielding error (L2R)2Δθ. And so forth.
Upshot of all this: in a billiard-ball system, small angular uncertainty grows exponentially with the number of collisions.
In fact, this simplified head-on collision scenario yields the slowest exponential growth. If the balls are hitting at random angles, then the uncertainty grows even faster.
This is a prototypical example of mathematical chaos: small errors grow exponentially as the system evolves over time. Given even a tiny amount of uncertainty in the initial conditions (or a tiny amount of noise from air molecules, or a tiny amount of noise from an uneven table surface, or …), the uncertainty grows, so we are unable to precisely forecast ball-positions far in the future. As we forecast further into the future, our predictions become highly uncertain.
An Information-Theoretic Point Of View
At first glance, chaos poses a small puzzle. The dynamics of billiard-balls are reversible: if we exactly know all the ball positions and velocities at one time, we can exactly calculate all the ball positions and velocities at an earlier or later time. In that sense, no information is lost. Why, then, do our predictions become highly uncertain?
The key is that angles and positions and velocities are real numbers, and specifying a real number takes an infinite number of digits—e.g. the angle of the first ball in our line of billiards balls might be θ = 0.0013617356590430716…. Even though it’s a small real number, it still has an infinite number of digits. And as the billiards system evolves, digits further and further back in the decimal expansion become relevant to the large-scale system behavior. But the digits far back in the decimal expansion are exactly the digits about which we have approximately-zero information, so as time runs forward, we become highly uncertain about the system state.
Conversely, our initial information about the large-scale system behavior still tells us a lot about the future state, but most of what it tells us is about digits far back in the decimal expansion of the future state variables (i.e. positions and velocities). Another way to put it: initially we have very precise information about the leading-order digits, but near-zero information about the lower-order digits further back. As the system evolves, these mix together. We end up with a lot of information about the leading-order and lower-order digits combined, but very little information about either one individually. (Classic example of how we can have lots of information about two variables combined but little information about either individually: I flip two coins in secret, then tell you that the two outcomes were the same. All the information is about the relationship between the two variables, not about the individual values.) So, even though we have a lot of information about the microscopic system state, our predictions about large-scale behavior (i.e. the leading-order digits) are highly uncertain.
Conserved Quantities
In our billiards example, not quite all large-scale information is lost to chaos. Assuming no friction and perfectly elastic collisions (remember, we dug this billiard table out of the physicists’ supply closet), energy is perfectly conserved over time. If we know the initial energy, we know the energy later on. If we approximately know the initial energy—the first two digits of the initial energy measured in Joules, for instance—then we approximately know the energy later on. That particular large-scale information is not lost to chaos; our uncertainty about the system’s energy does not grow over time.
More generally: in chaotic systems, there are typically some conserved quantities. Usually, as we predict further and further into the future, our (large-scale) predictions become maximally uncertain except for the conserved quantities.
Here are four other quantities conserved over time in our billiards system:
Number of balls
Area of the table
Number of balls of each color
Color of the table
In the context of the billiards table, these four quantities probably seem “less interesting” than energy, at least at first glance. That’s because the “variables” we were thinking about were mainly positions and velocities of the balls, and quantities like number of balls, area of the table, etc, do not depend on the positions and velocities of the balls. Of course the color of the table stays the same as the balls’ positions and velocities change, so of course our uncertainty about the color of the table stays the same as we become more uncertain about the balls’ positions and velocities.
… but the billiards table is usually used as a simplification of a more widely applicable model: an ideal gas. In the context of gasses, we often do things like add or remove gas from a container (i.e. change the number of balls), compress or expand gas in a piston (i.e. change the area of the table), or induce chemical reactions (i.e. change number of balls of each color). When we make those sorts of changes, things like number of balls and area of table do start to interact with the positions and velocities. And those quantities are still conserved by the balls’ dynamics (even if we’re changing them externally), so information about them won’t be wiped out by chaos.
An Interesting And Powerful Observation
Suppose we start off our billiards table with 10 balls at some average initial energy. (For the large majority of possible starting states, it doesn’t matter how that initial energy is distributed, since chaos will wipe out the details quickly anyway.) We let it bounce around a while, then add another 10 balls at rest, then adjust the table to half of its previous size. We want to predict the average energy of the balls at the end.
What’s interesting is that we can predict the average energy of the balls at the end, reasonably-accurately and robustly, just from the change in number of balls and table size. The average energy of the balls at the end is approximately-independent of the initial positions and velocities of all the individual balls; it only depends on initial energy, initial ball-number and table-area, the changes in ball-number and table-area, and the average energy of the added balls.
To put it in terms of gasses: empirically, we can reasonably-precisely and robustly predict the final temperature of a gas in an insulated piston from its initial temperature, particle-count and piston-volume, the change in particle-count and piston-volume, and the average energy of the added gas.
In other words: in billiards/gasses, we find that
We only need information about quantities conserved by the system’s dynamics to predict such conserved quantities later on, even when we’re externally changing those quantities.
Other than the quantities conserved by the system’s dynamics, all other information is wiped out by chaos over time.
Putting those two together: if we have even just a little initial uncertainty, then everything which we can predict about the system reasonably far into the future can be predicted using only knowledge of the quantities conserved by the dynamics, and how those quantities change.
That’s a remarkably powerful feature. Why does it happen, and how does it generalize?
At a very rough intuitive level, we can argue:
Chaos quickly wipes out all information except for quantities conserved by the system’s dynamics.
Therefore, insofar as the final state is predictable at all, everything predictable about it is predictable from just the conserved quantities.
With that argument in mind, we can see that gasses are at least a little bit special: we can predict the final energy/temperature precisely from conserved quantities of gasses, which is not always true for all systems. In general, sometimes the final energy/temperature (or final value of some other conserved quantity) does depend on the details of the initial conditions. For instance, suppose we set up a tiny pressure pad at one spot on the billiards table, and arrange to release a bunch more balls if the pad is hit. Then we need to know whether the pad is hit in order to precisely predict the final energy, and that in turn depends on the details of the initial conditions. It’s not precisely predictable just from the conserved quantities.
But, even with the pressure pad, we can make a slightly weaker claim: insofar as the final energy is predictable at all given a little bit of uncertainty in initial conditions, it’s predictable from just the conserved quantities. In other words, given just the conserved quantities, we can get a distribution over the final energy (based on the probability of the pressure pad being hit), and if we have even just a little uncertainty in the initial conditions, then we cannot predict any better than that distribution.
That’s the main interesting claim to generalize: insofar as we can predict reasonably-far-future times at all, given a little uncertainty in initial conditions, we cannot do any better than predictions based only on the quantities conserved by the system’s dynamics.
Open Problems[1]: Generalizing Chaos
Chaos is generally framed in the language of dynamic systems: some state x, which varies as a function of time x(t), according to some update rule
x(t+dt)=f(x(t))
Unfortunately this setup is not particularly well suited to a lot of the interesting problems to which the intuitive stories of chaos seem potentially applicable.
Minor Generalization: Drop Time-Symmetry
A first natural generalization is to allow the update rule to change over time:
x(t+dt)=f(x(t),t)
I’m sure plenty of mathematicians have worked on this sort of thing before, but I haven’t looked into it much. The core problems, as I see them, are:
Say useful things about the rate of loss of information about high-order bits
Characterize the information not lost, i.e. generalizations of conserved quantities
Find tractable algorithms for predicting (partial) state at one time from (partial) state at other times, via the generalizations of conserved quantities
More Significant Generalization: Drop Synchronous Time Altogether
In practice, we want to make predictions about things which are reasonably far away, not just in time, but in space. We want to take information about one little chunk of the universe, and use it to make predictions about some other little chunk, both somewhere else and somewhen else.
We could try to shoehorn this into a dynamic system model. Let x be the state of the whole spatially-distributed universe, back out whatever information we can get about initial conditions from looking at our one little chunk, then run dynamics forward again to make predictions about another little chunk. But it’s not really a natural language for talking about the problem. Dynamical systems just aren’t a very good language for talking about things separated by space in general; distributed systems don’t play well with a synchronous clock.
A bare-minimum model which handles both space and time well is a circuit (a.k.a. Bayes net, a.k.a. causal model, a.k.a. local equations respecting causality). In that language, rather than information propagating over “time”, it’s natural to think about information propagating through many layers of cuts through the circuit (a.k.a. Markov blankets):
Again, the core problems are:
Say useful things about the rate of loss of information about high-order bits across a sequence of cuts/Markov blankets
Characterize the information not lost, i.e. generalizations of conserved quantities across cuts/Markov blankets
Find tractable algorithms for predicting (partial) state of one chunk of the world from (partial) states of other chunks, via the generalizations of conserved quantities
One additional problem, which is much less natural in the dynamical system formulation:
Is there a useful “local” version of information-conservation, i.e. one which looks at the “dynamics” of just a chunk rather than a whole time-slice or other Markov blanket through the circuit? If so, how does “local” information-conservation relate to “global” information-conservation, i.e. conservation across time-slices or other Markov blankets?
This is what a lot of my own research has been about the last couple years, but I’m still not fully satisfied with the results.
Even More General: Drop Graphical Structure
Finally: how might we drop graphical structure more generally? Chaos provides a general story that:
All information except “conserved quantities” is “quickly” “wiped out”
Therefore, insofar as reasonably-”later” “states” are predictable at all, everything predictable about them is predictable just from the “conserved quantities”
Intuitively, it seems like there’s a story here which generalizes to all large prediction problems, or at least a very wide space of them.
An example: Suppose we have a binary function f, with a million input bits and one output bit. The function is uniformly randomly chosen from all such functions—i.e. for each of the 2^1000000 possible inputs x, we flipped a coin to determine the output f(x) for that particular input. Now, suppose we know f, and we know all but 50 of the input bits—i.e. we know 999950 of the input bits. How much information do we have about the output?
Answer: almost none. For almost all such functions, knowing 999950 input bits gives us ∼1/2^50 bits of information about the output. More generally, if the function has n
input bits and we know all but k, then we have ~ (12)k bits of information about the output. Our information drops off exponentially with the number of unknown bits. (Rough argument here.)
Intuitively, what’s going on here? Well, random functions are very sensitive to all the input bits; flipping one input bit changes the value of the function about half the time. So, if we’re uncertain about just a few of the input bits, that uncertainty wipes out our information about the output exponentially quickly. Compare to the story in chaotic systems: in a chaotic system, the final state after a reasonable time is very sensitive to lots of the bits of the initial state; flipping one relatively-low-order initial state bit typically changes the large-scale later state. So, if we’re uncertain about just a few of the initial bits (i.e. the low-order bits), that uncertainty wipes out our information about the final state exponentially quickly.
To properly generalize the ideas of chaos to this sort of setup, the core problems are again:
Say useful things about the rate of loss of information with respect to the number of unknown bits
Characterize the information lost, i.e. generalizations of conserved quantities
Find tractable algorithms for predicting some stuff given some other stuff, via the generalizations of conserved quantities.
One additional problem:
What’s the right language in which to talk about this phenomenon, in general? It seems to go beyond both dynamical systems and circuits, so what’s the right setting?
These problems are basically open to the best of my knowledge. I do not claim that my knowledge is very comprehensive.