Your post makes me wonder if you’ve thought about the material/causal conditions that give rise to the production of FI units,
One thing to notice is that in many cases it takes a long period of incubation, conceptual reorganization, and sociological diffusion for the full implications of an FI unit to be recognized. For example, Vapnik and Chervonenkis published the first VC-theory work in 1968, but the Support Vector Machine was not discovered until the 90s. Pearl’s book on causality was published in 2000, but the graphical model framework it depends on dates back at least to the 80s and maybe even as far back as the Chow-Liu algorithm published in 1968. The implication is that the roots of the next set of FIs are probably out there right now—it’s just an issue of figuring out which concepts are truly significant.
On the question of milestones, here is one of particular interest to me. A data compressor implicitly contains a statistical model. One can sample from that model by feeding a random sequence of bits to the decoder component. Let’s say we built a specialized compressor for images of the Manhattan streetscape. Now if the compressor is very good, samples from it will be indistinguishable from real images of Manhattan. I think it will be a huge milestone if someone can build a compressor that generates images realistic enough to fool humans—a kind of visual Turing Test. That goal now seems impossibly distant, but it can be approached by a direct procedure: build a large database of streetscape images, and conduct a systematic search the compressor that reduces the database to the shortest possible size. I think the methods required to achieve that would constitute an FI, and if Schmidhuber/Hutter/Legg group can pull that off, I’ll hail them as truly great scientists.
the Support Vector Machine was not discovered until the 90s.
Why not? I’m not familiar with VC-theory, but the basic idea of separating two sets of points with a hyperplane with the maximum margin doesn’t seem that complex. What made this difficult?
Don’t quote me on this, but I believe the key insight is that the complexity of the max margin hyperplane model depends not on the number of dimensions of the feature space (which may be very large) but on the number of data points used to define the hyperplane (the support vectors), and the latter quantity is usually small. Though that realization is intuitively plausible, it required the VC-theory to actually prove.
The second part of this confuses me, standard compression schemes are good by this measure, images compressed by it are still quite accurate. Did you mean that random data uncompressed by the algorithm is indistinguishable from real images of Manhattan?
To sample from a compressor, you generate a sequence of random bits and feed it into the decompressor component. If the compressor is very well-suited to Manhattan images, the output of this process will be synthetic images that resemble the real city images. If you try to sample from a standard image compressor, you will just get a greyish haze.
I call this the veridical simulation principle. It is useful because it allows a researcher to detect the ways in which a model is deficient. If the model doesn’t handle shadows correctly, the researcher will realize this when the sampling process produces an image of a tree that casts no shade.
One thing to notice is that in many cases it takes a long period of incubation, conceptual reorganization, and sociological diffusion for the full implications of an FI unit to be recognized. For example, Vapnik and Chervonenkis published the first VC-theory work in 1968, but the Support Vector Machine was not discovered until the 90s. Pearl’s book on causality was published in 2000, but the graphical model framework it depends on dates back at least to the 80s and maybe even as far back as the Chow-Liu algorithm published in 1968. The implication is that the roots of the next set of FIs are probably out there right now—it’s just an issue of figuring out which concepts are truly significant.
On the question of milestones, here is one of particular interest to me. A data compressor implicitly contains a statistical model. One can sample from that model by feeding a random sequence of bits to the decoder component. Let’s say we built a specialized compressor for images of the Manhattan streetscape. Now if the compressor is very good, samples from it will be indistinguishable from real images of Manhattan. I think it will be a huge milestone if someone can build a compressor that generates images realistic enough to fool humans—a kind of visual Turing Test. That goal now seems impossibly distant, but it can be approached by a direct procedure: build a large database of streetscape images, and conduct a systematic search the compressor that reduces the database to the shortest possible size. I think the methods required to achieve that would constitute an FI, and if Schmidhuber/Hutter/Legg group can pull that off, I’ll hail them as truly great scientists.
Why not? I’m not familiar with VC-theory, but the basic idea of separating two sets of points with a hyperplane with the maximum margin doesn’t seem that complex. What made this difficult?
Don’t quote me on this, but I believe the key insight is that the complexity of the max margin hyperplane model depends not on the number of dimensions of the feature space (which may be very large) but on the number of data points used to define the hyperplane (the support vectors), and the latter quantity is usually small. Though that realization is intuitively plausible, it required the VC-theory to actually prove.
The second part of this confuses me, standard compression schemes are good by this measure, images compressed by it are still quite accurate. Did you mean that random data uncompressed by the algorithm is indistinguishable from real images of Manhattan?
To sample from a compressor, you generate a sequence of random bits and feed it into the decompressor component. If the compressor is very well-suited to Manhattan images, the output of this process will be synthetic images that resemble the real city images. If you try to sample from a standard image compressor, you will just get a greyish haze.
I call this the veridical simulation principle. It is useful because it allows a researcher to detect the ways in which a model is deficient. If the model doesn’t handle shadows correctly, the researcher will realize this when the sampling process produces an image of a tree that casts no shade.
OK, that makes sense. It’s isomorphic to doing model checking by looking data generated by your model.