Edge detection is rather trivial. Visual recognition however is not, and there certainly are benchmarks and comparable results in that field. Have you browsed the recent pubs of Poggio et al at MIT vision lab? There is lots of recent progress, with results matching human levels for quick recognition tasks.
Also, vision is not a tiny part of intelligence. Its the single largest functional component of the cortex, by far. The cortex uses the same essential low-level optimization algorithm everywhere, so understanding vision at the detailed level is a good step towards understanding the whole thing.
And finally and most relevant for AGI, the higher visual regions also give us the capacity for visualization and are critical for higher creative intelligence. Literally all scientific discovery and progress depends on this system.
“visualization is the key to enlightenment” and all that
It’s only trivial if you define an “edge” in a trivial way, e.g. as a set of points where the intensity gradient is greater than a certain threshold. This kind of definition has little use: given a picture of a tree trunk, this definition will indicate many edges corresponding to the ridges and corrugations of the bark, and will not highlight the meaningful edge between the trunk and the background.
I don’t believe that there is much real progress recently in vision. I think the state of the art is well illustrated by the “racist” HP web camera that detects white faces but not black faces.
Also, vision is not a tiny part of intelligence [...] The cortex uses the same essential low-level optimization algorithm everywhere,
I actually agree with you about this, but I think most people on LW would disagree.
Whether you are talking about canny edge filters, gabor like edge detection more similar to what V1 self-organizes into, they are all still relatively simple—trivial compared to AGI. Trivial as in something you code in a few hours for your screen filter system in a modern game render engine.
The particular problem you point out with the tree trunk is a scale problem and is easily handled in any good vision system.
An edge detection filter is just a building block, its not the complete system.
In HVS, initial edge preprocessing is done in the retina itself which essentially does on-center, off-surround gaussian filters (similar to low-pass filters in photoshop). The output of the retina is thus essentially a multi-resolution image set, similar to a wavelet decomposition. The image output at this stage becomes a series of edge differences (local gradients), but at numerous spatial scales.
The high frequency edges such as the ridges and corrugations of the bark are cleanly separated from the more important low frequency edges separating the tree trunk from the background. V1 then detects edge orientations at these various scales, and higher layers start recognizing increasingly complex statistical patterns of edges across larger fields of view.
Whether there is much real progress recently in computer vision is relative to one’s expectations, but the current state of the art in research systems at least is far beyond your simplistic assessment. I have a layman’s overview of HVS here. If you really want to know about the current state of the art in research, read some recent papers from a place like Poggio’s lab at MIT.
In the product space, the HP web camera example is also very far from the state of the art, I’m surprised that you posted that.
There is free eye tracking software you can get (running on your PC) that can use your web cam to track where your eyes are currently focused in real time. That’s still not even the state of the art in the product space—that would probably be the systems used in the more expensive robots, and of course that lags the research state of the art.
Edge detection is rather trivial. Visual recognition however is not, and there certainly are benchmarks and comparable results in that field. Have you browsed the recent pubs of Poggio et al at MIT vision lab? There is lots of recent progress, with results matching human levels for quick recognition tasks.
Also, vision is not a tiny part of intelligence. Its the single largest functional component of the cortex, by far. The cortex uses the same essential low-level optimization algorithm everywhere, so understanding vision at the detailed level is a good step towards understanding the whole thing.
And finally and most relevant for AGI, the higher visual regions also give us the capacity for visualization and are critical for higher creative intelligence. Literally all scientific discovery and progress depends on this system.
“visualization is the key to enlightenment” and all that
the visual system
It’s only trivial if you define an “edge” in a trivial way, e.g. as a set of points where the intensity gradient is greater than a certain threshold. This kind of definition has little use: given a picture of a tree trunk, this definition will indicate many edges corresponding to the ridges and corrugations of the bark, and will not highlight the meaningful edge between the trunk and the background.
I don’t believe that there is much real progress recently in vision. I think the state of the art is well illustrated by the “racist” HP web camera that detects white faces but not black faces.
I actually agree with you about this, but I think most people on LW would disagree.
Whether you are talking about canny edge filters, gabor like edge detection more similar to what V1 self-organizes into, they are all still relatively simple—trivial compared to AGI. Trivial as in something you code in a few hours for your screen filter system in a modern game render engine.
The particular problem you point out with the tree trunk is a scale problem and is easily handled in any good vision system.
An edge detection filter is just a building block, its not the complete system.
In HVS, initial edge preprocessing is done in the retina itself which essentially does on-center, off-surround gaussian filters (similar to low-pass filters in photoshop). The output of the retina is thus essentially a multi-resolution image set, similar to a wavelet decomposition. The image output at this stage becomes a series of edge differences (local gradients), but at numerous spatial scales.
The high frequency edges such as the ridges and corrugations of the bark are cleanly separated from the more important low frequency edges separating the tree trunk from the background. V1 then detects edge orientations at these various scales, and higher layers start recognizing increasingly complex statistical patterns of edges across larger fields of view.
Whether there is much real progress recently in computer vision is relative to one’s expectations, but the current state of the art in research systems at least is far beyond your simplistic assessment. I have a layman’s overview of HVS here. If you really want to know about the current state of the art in research, read some recent papers from a place like Poggio’s lab at MIT.
In the product space, the HP web camera example is also very far from the state of the art, I’m surprised that you posted that.
There is free eye tracking software you can get (running on your PC) that can use your web cam to track where your eyes are currently focused in real time. That’s still not even the state of the art in the product space—that would probably be the systems used in the more expensive robots, and of course that lags the research state of the art.