We will then compare the likelihoods achieved by the two methods; higher likelihood wins. If there is ambiguity concerning the validity of the result, then we will compress the data set using compression algorithms based on the models and compare compression rates. Constructing a compressor from a statistical model is essentially a technical exercise; I can provide a Java implementation of arithmetic encoding. The compression rates must take into account the size of the compressor itself.
“Likelihood” is ambiguous: a Bayes net can be defined with a single hidden node for “object ID”, and all the observed nodes dependent only on the object ID, with conditional probability tables lifted from large look-up tables. This Bayes net would have a low probability from a prior belief distribution over Bayes nets but a high likelihood to the data.
To define compression unambiguously, you should agree on a programming language or executable format, and on runtime and memory bounds on a reference computer, as in the Hutter Prize.
An alternative to a test of compression would be a test of sequential prediction of e.g. individual real-valued or integer measurements, conditional on values of previous measurements according to some ordering. For each measurement, the predictor program would generate a predictive probability distribution, and a judge program would normalize the distribution or test that it summed to 1. The predictor’s score would be the log of the product of the sequential likelihoods. Compared to arithmetic compressor/decompressor pairs, sequential predictors might be less technically demanding to program (no problem of synchronizing the compressor and decompressor), but might be more technically demanding to judge (a predictive distribution from Monte Carlo sampling may depend on allowed computation time). The size of the predictor might still need to be included.
The Bayes Net contestant should be permitted to use prior belief distributions over network structures (as here). (This can be encoded as a causal belief about a hidden cause of patterns of causation.)
The Bayes Net contestant may wish to use network search algorithms or priors that hypothesize causal models with unobserved latent nodes.
The Bayes Net contestant should remember that the Skeptic contestant can use compression schemes based on reconstructability analysis, which is relatively powerful, and Markov graphical models, which include all simple Bayes nets for purposes of this problem.
[. . .] while the belief network paradigm is mathematically elegant and intuitively appealing, it is NOT very useful for describing real data.
The challenge is to prove the above claim wrong. [. . .]
The challenge hinges on the meaning of the word “real data”. [. . .] Other than that, there are no limitations—it can be image data, text, speech, machine learning sets, NetFlix, social science databases, etc.
Some data are about objects whose existence or relations are causally dependent on objects’ properties. Simple Bayesian networks cannot usually represent knowledge of such causal dependencies, but there are more general families of Bayesian models which can. Would it be in the spirit of the challenge for the Bayes Net contestant to use these more general models? Would it be out of the spirit of the challenge for the data to be about such a collection of objects?
“Likelihood” is ambiguous: [...] This Bayes net would have a low probability from a prior belief distribution over Bayes nets but a high likelihood to the data.
Right—this would be what I’d call “cheating” or overfitting the data. We’d have to use the compression rate in this case.
To define compression unambiguously, you should agree on a programming language or executable format, and on runtime and memory bounds on a reference computer
Sure. I’ll work out the technical details if anyone wants to enter the contest. I would prefer to use the most recent stable JVM. It seems very unlikely to me that the outcome of the contest will depend on precise selection of time or memory bounds—let’s say, the time bound is O(24 hours) and the memory bound in O(2 GB).
An alternative to a test of compression
It’s actually not very difficult to implement a compression program using arithmetic coding once you have the statistical model. Other prediction evaluation schemes may work, but compression has methodological crispness: look at the compressed file size, check that the decompressed data matches the original exactly.
Would it be in the spirit of the challenge for the Bayes Net contestant to use these more general models? Would it be out of the spirit of the challenge for the data to be about such a collection of objects?
Basically, when I say “belief networks”, what I mean is the use of graphs to define probability distributions and conditional independence relationships.
The spirit of the contest is to use a truly “natural” data set. I admit that this is a bit vague. Really my only requirement is to use a non-synthetic data set. I think I know where you’re going with the “causally dependent” line of thinking, but it doesn’t bother me too much. I get the feeling that I am walking into a trap, but really I’ve been planning to make a donation to SIAI anyway, so I don’t mind losing.
Some comments:
“Likelihood” is ambiguous: a Bayes net can be defined with a single hidden node for “object ID”, and all the observed nodes dependent only on the object ID, with conditional probability tables lifted from large look-up tables. This Bayes net would have a low probability from a prior belief distribution over Bayes nets but a high likelihood to the data.
To define compression unambiguously, you should agree on a programming language or executable format, and on runtime and memory bounds on a reference computer, as in the Hutter Prize.
An alternative to a test of compression would be a test of sequential prediction of e.g. individual real-valued or integer measurements, conditional on values of previous measurements according to some ordering. For each measurement, the predictor program would generate a predictive probability distribution, and a judge program would normalize the distribution or test that it summed to 1. The predictor’s score would be the log of the product of the sequential likelihoods. Compared to arithmetic compressor/decompressor pairs, sequential predictors might be less technically demanding to program (no problem of synchronizing the compressor and decompressor), but might be more technically demanding to judge (a predictive distribution from Monte Carlo sampling may depend on allowed computation time). The size of the predictor might still need to be included.
The Bayes Net contestant should be permitted to use prior belief distributions over network structures (as here). (This can be encoded as a causal belief about a hidden cause of patterns of causation.)
The Bayes Net contestant may wish to use network search algorithms or priors that hypothesize causal models with unobserved latent nodes.
The Bayes Net contestant should remember that the Skeptic contestant can use compression schemes based on reconstructability analysis, which is relatively powerful, and Markov graphical models, which include all simple Bayes nets for purposes of this problem.
Some data are about objects whose existence or relations are causally dependent on objects’ properties. Simple Bayesian networks cannot usually represent knowledge of such causal dependencies, but there are more general families of Bayesian models which can. Would it be in the spirit of the challenge for the Bayes Net contestant to use these more general models? Would it be out of the spirit of the challenge for the data to be about such a collection of objects?
Right—this would be what I’d call “cheating” or overfitting the data. We’d have to use the compression rate in this case.
Sure. I’ll work out the technical details if anyone wants to enter the contest. I would prefer to use the most recent stable JVM. It seems very unlikely to me that the outcome of the contest will depend on precise selection of time or memory bounds—let’s say, the time bound is O(24 hours) and the memory bound in O(2 GB).
It’s actually not very difficult to implement a compression program using arithmetic coding once you have the statistical model. Other prediction evaluation schemes may work, but compression has methodological crispness: look at the compressed file size, check that the decompressed data matches the original exactly.
Basically, when I say “belief networks”, what I mean is the use of graphs to define probability distributions and conditional independence relationships.
The spirit of the contest is to use a truly “natural” data set. I admit that this is a bit vague. Really my only requirement is to use a non-synthetic data set. I think I know where you’re going with the “causally dependent” line of thinking, but it doesn’t bother me too much. I get the feeling that I am walking into a trap, but really I’ve been planning to make a donation to SIAI anyway, so I don’t mind losing.