but I still don’t see how the effort is going to take 1 or 2 centuries. A century is a loooong time.
I think the following quote is illustrative of the problems facing the field:
After [David Marr] joined us, our team became the most famous vision group in the world, but the one with the fewest results. His idea was a disaster. The edge finders they have now using his theories, as far as I can see, are slightly worse than the ones we had just before taking him on. We’ve lost twenty years.
-Marvin Minsky, quoted in “AI” by Daniel Crevier.
Some notes and interpretation of this comment:
Most vision researchers, if asked who is the most important contributor to their field, would probably answer “David Marr”. He set the direction for subsequent research in the field; students in introductory vision classes read his papers first.
Edge detection is a tiny part of vision, and vision is a tiny part of intelligence, but at least in Minsky’s view, no progress (or reverse progress) was achieved in twenty years of research by the leading lights of the field.
There is no standard method for evaluating edge detector algorithms, so it is essentially impossible to measure progress in any rigorous way.
I think this kind of observation justifies AI-timeframes on the order of centuries.
Edge detection is rather trivial. Visual recognition however is not, and there certainly are benchmarks and comparable results in that field. Have you browsed the recent pubs of Poggio et al at MIT vision lab? There is lots of recent progress, with results matching human levels for quick recognition tasks.
Also, vision is not a tiny part of intelligence. Its the single largest functional component of the cortex, by far. The cortex uses the same essential low-level optimization algorithm everywhere, so understanding vision at the detailed level is a good step towards understanding the whole thing.
And finally and most relevant for AGI, the higher visual regions also give us the capacity for visualization and are critical for higher creative intelligence. Literally all scientific discovery and progress depends on this system.
“visualization is the key to enlightenment” and all that
It’s only trivial if you define an “edge” in a trivial way, e.g. as a set of points where the intensity gradient is greater than a certain threshold. This kind of definition has little use: given a picture of a tree trunk, this definition will indicate many edges corresponding to the ridges and corrugations of the bark, and will not highlight the meaningful edge between the trunk and the background.
I don’t believe that there is much real progress recently in vision. I think the state of the art is well illustrated by the “racist” HP web camera that detects white faces but not black faces.
Also, vision is not a tiny part of intelligence [...] The cortex uses the same essential low-level optimization algorithm everywhere,
I actually agree with you about this, but I think most people on LW would disagree.
Whether you are talking about canny edge filters, gabor like edge detection more similar to what V1 self-organizes into, they are all still relatively simple—trivial compared to AGI. Trivial as in something you code in a few hours for your screen filter system in a modern game render engine.
The particular problem you point out with the tree trunk is a scale problem and is easily handled in any good vision system.
An edge detection filter is just a building block, its not the complete system.
In HVS, initial edge preprocessing is done in the retina itself which essentially does on-center, off-surround gaussian filters (similar to low-pass filters in photoshop). The output of the retina is thus essentially a multi-resolution image set, similar to a wavelet decomposition. The image output at this stage becomes a series of edge differences (local gradients), but at numerous spatial scales.
The high frequency edges such as the ridges and corrugations of the bark are cleanly separated from the more important low frequency edges separating the tree trunk from the background. V1 then detects edge orientations at these various scales, and higher layers start recognizing increasingly complex statistical patterns of edges across larger fields of view.
Whether there is much real progress recently in computer vision is relative to one’s expectations, but the current state of the art in research systems at least is far beyond your simplistic assessment. I have a layman’s overview of HVS here. If you really want to know about the current state of the art in research, read some recent papers from a place like Poggio’s lab at MIT.
In the product space, the HP web camera example is also very far from the state of the art, I’m surprised that you posted that.
There is free eye tracking software you can get (running on your PC) that can use your web cam to track where your eyes are currently focused in real time. That’s still not even the state of the art in the product space—that would probably be the systems used in the more expensive robots, and of course that lags the research state of the art.
I think the following quote is illustrative of the problems facing the field:
-Marvin Minsky, quoted in “AI” by Daniel Crevier.
Some notes and interpretation of this comment:
Most vision researchers, if asked who is the most important contributor to their field, would probably answer “David Marr”. He set the direction for subsequent research in the field; students in introductory vision classes read his papers first.
Edge detection is a tiny part of vision, and vision is a tiny part of intelligence, but at least in Minsky’s view, no progress (or reverse progress) was achieved in twenty years of research by the leading lights of the field.
There is no standard method for evaluating edge detector algorithms, so it is essentially impossible to measure progress in any rigorous way.
I think this kind of observation justifies AI-timeframes on the order of centuries.
Edge detection is rather trivial. Visual recognition however is not, and there certainly are benchmarks and comparable results in that field. Have you browsed the recent pubs of Poggio et al at MIT vision lab? There is lots of recent progress, with results matching human levels for quick recognition tasks.
Also, vision is not a tiny part of intelligence. Its the single largest functional component of the cortex, by far. The cortex uses the same essential low-level optimization algorithm everywhere, so understanding vision at the detailed level is a good step towards understanding the whole thing.
And finally and most relevant for AGI, the higher visual regions also give us the capacity for visualization and are critical for higher creative intelligence. Literally all scientific discovery and progress depends on this system.
“visualization is the key to enlightenment” and all that
the visual system
It’s only trivial if you define an “edge” in a trivial way, e.g. as a set of points where the intensity gradient is greater than a certain threshold. This kind of definition has little use: given a picture of a tree trunk, this definition will indicate many edges corresponding to the ridges and corrugations of the bark, and will not highlight the meaningful edge between the trunk and the background.
I don’t believe that there is much real progress recently in vision. I think the state of the art is well illustrated by the “racist” HP web camera that detects white faces but not black faces.
I actually agree with you about this, but I think most people on LW would disagree.
Whether you are talking about canny edge filters, gabor like edge detection more similar to what V1 self-organizes into, they are all still relatively simple—trivial compared to AGI. Trivial as in something you code in a few hours for your screen filter system in a modern game render engine.
The particular problem you point out with the tree trunk is a scale problem and is easily handled in any good vision system.
An edge detection filter is just a building block, its not the complete system.
In HVS, initial edge preprocessing is done in the retina itself which essentially does on-center, off-surround gaussian filters (similar to low-pass filters in photoshop). The output of the retina is thus essentially a multi-resolution image set, similar to a wavelet decomposition. The image output at this stage becomes a series of edge differences (local gradients), but at numerous spatial scales.
The high frequency edges such as the ridges and corrugations of the bark are cleanly separated from the more important low frequency edges separating the tree trunk from the background. V1 then detects edge orientations at these various scales, and higher layers start recognizing increasingly complex statistical patterns of edges across larger fields of view.
Whether there is much real progress recently in computer vision is relative to one’s expectations, but the current state of the art in research systems at least is far beyond your simplistic assessment. I have a layman’s overview of HVS here. If you really want to know about the current state of the art in research, read some recent papers from a place like Poggio’s lab at MIT.
In the product space, the HP web camera example is also very far from the state of the art, I’m surprised that you posted that.
There is free eye tracking software you can get (running on your PC) that can use your web cam to track where your eyes are currently focused in real time. That’s still not even the state of the art in the product space—that would probably be the systems used in the more expensive robots, and of course that lags the research state of the art.