Alternative solution: package all the cosmic observations into a big database. Given an astrophysical theory, instantiate it as a specialized compression program, and invoke it on the database. To select between rival theories, measure the sum of encoded file size plus length of compressor; smaller is better.
So every theory has to cover all of cosmology? Most astrophysicists study really specific things, and make advances in those; their theories [usually] don’t contradict other theories, but usually don’t make predictions about the entire universe.
I actually don’t think this would work at all. Each event observed is usually uncorrelated (often outside the light cone) with almost all other events. In this case, room for compression is very small; certainly a pulsar series can be compressed well, but can it be compressed substantially better by better astrophysical theories? I think a program that could actually model so much of the universe would be huge compared to a much more naive compressor, and the difference might well exceed the difference in compression.
Also, everything has a margin of error. Is this compression supposed to be lossless? No physical theory will outperform 7zip (or whatever), because to get all the digits right it will need those or correction factors of nearly that size anyway. If it’s lossy, how are we ensuring that it’s accepting the right losses, and the right amount? Given these, I suspect a model of our observational equipment and the database storage model will compress much better than a cosmological model, and both might under-perform a generic compression utility.
So every theory has to cover all of cosmology? Most astrophysicists study really specific things, and make advances in those;
In this case, the researcher would take the current standard compressor/model and make a modification to a single module or component of the software, and then show that the modification leads to improved codelengths.
No physical theory will outperform 7zip (or whatever), because to get all the digits right it will need those or correction factors of nearly that size anyway.
You’re getting at an important subtlety, which is that if the observations include many digits of precision, it will be impossible to achieve good compression in absolute terms. But the absolute rate is irrelevant; the point is to compare theories. So maybe theory A can only achieve 10% compression, but if the previous champion only gets 9%, then theory A should be preferred. But a specialized compressor based on astrophysical theories will outperform 7zip on a database of cosmological observations, though maybe by only a small amount in absolute terms.
You seem very confident in both those points. Can you justify that? I’m familiar with (both implemented and reverse-engineered) both generic and specialized compressions algorithms, and I don’t personally see a way to assure that (accurate) cosmological models substantially outperform generic compression, at least without loss. On the other hand, I have little experience with astronomy, so please correct me where I’m making inaccurate assumptions.
I’m imagining this database to be structured so that it holds rows along the lines of (datetime, position, luminosity by spectral component). Since I don’t have a background in astronomy, maybe that’s a complete misunderstanding. However, I see this as holding an enormous number of events, each of which consists of a small amount of information, most of which are either unrelated to other events in the database or trivially related so that very simple rules would predict better than trying to model all of the physical processes occurring in the stars [or whatever] that were the source of the event.
Part of the reason I feel this way is that we can gather so little information; the luminosity of a star varies, and we understand at least some about what can make it vary, but I am currently under the impression that actually understanding a distant star’s internal processes is so far away from what we can gather from the little light we receive that most of the variance is expected but isn’t predictable. We don’t even understand our own Sun that well!
There is also the problem with weighing items; if I assume that an accurate cosmological model would work well, one that accurately predicts stellar life cycles but wholly misunderstands the acceleration of the expansion of the universe would do much better than a model that accurately captured all of that, but to even a small degree was less well fitted to observed stellar life cycles (even if it is more accurate and less over fitted). Some of the most interesting questions we are investigating right now are the rarest events; if we have a row in the database for each observable time period, you start with an absolutely enormous number of rows for each observable star, but once-in-a-lifetime events are what really intrigue and confound us; starting with so little data, compressing them is simply not worth the compressors time, relative to compressing the much better understood phenomena.
I think it would be easier to figure out how much data it would take. For example, I can easily work out that the data in something happening 1,001,000 times out of 2,000,000 when that’s how often it should happen is 1,999,998.56 bits. Actually making the compression isn’t so easy.
This seems like it would lead to overfitting on the random details of our particular universe, when what we really want (I think) is a theory that equally describes our universe or any sufficiently similar one.
First off, when you have that much data, over-fitting won’t make a big difference. For example, you’ll get a prediction that something happens between 999,000 and 1,001,000 times, instead of 1,000,000. Second, the correct answer would take 2,000,000 bits. The incorrect one would take 1001000-ln(0.5005)/ln(2)+999000-ln(0.4995)/ln(2) = 1,999,998.56 bits. The difference in data will always be how unlikely it is to be that far from the mean.
Third, and most importantly, no matter how much your intuition says otherwise, this actually is the correct way to do it. The more bits you have to use, the less likely it is. The coincidence might not seem interesting, but that exact sequence of data is unlikely. What normally makes it seem like a coincidence is that there seems to be a way to explain it with smaller data.
I agree that this is an attractive alternative solution.
And allow me to rephrase. Since human scientists stick too much to the first hypothesis that seems to fit the data (confirmation bias) and have a regrettable tendency unfairly to promote hypotheses that they and their friends discovered (motivated cognition—the motivation being the fame that comes from being known as the discoverer of an important successful hypothesis), it would win for the enterprise of science to move where possible to having algorithms generate the hypotheses.
Since the hypotheses “found” (more accurately, “promoted to prominence” or “favored”) by the algorithms will be expressed in formal language, professionals with scientific skills, PhD, tenure and such will still be needed to translate them into English. Professionals will also still be necessary to refine the hypothesis-finding (actually “hypothesis-favoring”) algorithms and to identify good opportunities for collecting more observations.
The compression method doesn’t specify that part; this shouldn’t be considered a weakness, since the traditional method doesn’t either. Both methods depend on human intuition, strokes of genius, falling apples, etc.
Alternative solution: package all the cosmic observations into a big database. Given an astrophysical theory, instantiate it as a specialized compression program, and invoke it on the database. To select between rival theories, measure the sum of encoded file size plus length of compressor; smaller is better.
So every theory has to cover all of cosmology? Most astrophysicists study really specific things, and make advances in those; their theories [usually] don’t contradict other theories, but usually don’t make predictions about the entire universe.
I actually don’t think this would work at all. Each event observed is usually uncorrelated (often outside the light cone) with almost all other events. In this case, room for compression is very small; certainly a pulsar series can be compressed well, but can it be compressed substantially better by better astrophysical theories? I think a program that could actually model so much of the universe would be huge compared to a much more naive compressor, and the difference might well exceed the difference in compression.
Also, everything has a margin of error. Is this compression supposed to be lossless? No physical theory will outperform 7zip (or whatever), because to get all the digits right it will need those or correction factors of nearly that size anyway. If it’s lossy, how are we ensuring that it’s accepting the right losses, and the right amount? Given these, I suspect a model of our observational equipment and the database storage model will compress much better than a cosmological model, and both might under-perform a generic compression utility.
In this case, the researcher would take the current standard compressor/model and make a modification to a single module or component of the software, and then show that the modification leads to improved codelengths.
You’re getting at an important subtlety, which is that if the observations include many digits of precision, it will be impossible to achieve good compression in absolute terms. But the absolute rate is irrelevant; the point is to compare theories. So maybe theory A can only achieve 10% compression, but if the previous champion only gets 9%, then theory A should be preferred. But a specialized compressor based on astrophysical theories will outperform 7zip on a database of cosmological observations, though maybe by only a small amount in absolute terms.
You seem very confident in both those points. Can you justify that? I’m familiar with (both implemented and reverse-engineered) both generic and specialized compressions algorithms, and I don’t personally see a way to assure that (accurate) cosmological models substantially outperform generic compression, at least without loss. On the other hand, I have little experience with astronomy, so please correct me where I’m making inaccurate assumptions.
I’m imagining this database to be structured so that it holds rows along the lines of (datetime, position, luminosity by spectral component). Since I don’t have a background in astronomy, maybe that’s a complete misunderstanding. However, I see this as holding an enormous number of events, each of which consists of a small amount of information, most of which are either unrelated to other events in the database or trivially related so that very simple rules would predict better than trying to model all of the physical processes occurring in the stars [or whatever] that were the source of the event.
Part of the reason I feel this way is that we can gather so little information; the luminosity of a star varies, and we understand at least some about what can make it vary, but I am currently under the impression that actually understanding a distant star’s internal processes is so far away from what we can gather from the little light we receive that most of the variance is expected but isn’t predictable. We don’t even understand our own Sun that well!
There is also the problem with weighing items; if I assume that an accurate cosmological model would work well, one that accurately predicts stellar life cycles but wholly misunderstands the acceleration of the expansion of the universe would do much better than a model that accurately captured all of that, but to even a small degree was less well fitted to observed stellar life cycles (even if it is more accurate and less over fitted). Some of the most interesting questions we are investigating right now are the rarest events; if we have a row in the database for each observable time period, you start with an absolutely enormous number of rows for each observable star, but once-in-a-lifetime events are what really intrigue and confound us; starting with so little data, compressing them is simply not worth the compressors time, relative to compressing the much better understood phenomena.
I think it would be easier to figure out how much data it would take. For example, I can easily work out that the data in something happening 1,001,000 times out of 2,000,000 when that’s how often it should happen is 1,999,998.56 bits. Actually making the compression isn’t so easy.
This seems like it would lead to overfitting on the random details of our particular universe, when what we really want (I think) is a theory that equally describes our universe or any sufficiently similar one.
First off, when you have that much data, over-fitting won’t make a big difference. For example, you’ll get a prediction that something happens between 999,000 and 1,001,000 times, instead of 1,000,000. Second, the correct answer would take 2,000,000 bits. The incorrect one would take 1001000-ln(0.5005)/ln(2)+999000-ln(0.4995)/ln(2) = 1,999,998.56 bits. The difference in data will always be how unlikely it is to be that far from the mean.
Third, and most importantly, no matter how much your intuition says otherwise, this actually is the correct way to do it. The more bits you have to use, the less likely it is. The coincidence might not seem interesting, but that exact sequence of data is unlikely. What normally makes it seem like a coincidence is that there seems to be a way to explain it with smaller data.
Can someone else explain this better?
I agree that this is an attractive alternative solution.
And allow me to rephrase. Since human scientists stick too much to the first hypothesis that seems to fit the data (confirmation bias) and have a regrettable tendency unfairly to promote hypotheses that they and their friends discovered (motivated cognition—the motivation being the fame that comes from being known as the discoverer of an important successful hypothesis), it would win for the enterprise of science to move where possible to having algorithms generate the hypotheses.
Since the hypotheses “found” (more accurately, “promoted to prominence” or “favored”) by the algorithms will be expressed in formal language, professionals with scientific skills, PhD, tenure and such will still be needed to translate them into English. Professionals will also still be necessary to refine the hypothesis-finding (actually “hypothesis-favoring”) algorithms and to identify good opportunities for collecting more observations.
How do you come up with astrophysical theories in the first place? Ask a human, an AI, or just test every single computable possibility?
The compression method doesn’t specify that part; this shouldn’t be considered a weakness, since the traditional method doesn’t either. Both methods depend on human intuition, strokes of genius, falling apples, etc.