This seems like a fairly small extension of a very well-studied problem in image recognition. Training a CNN to distinguish whatever your reference humans want to consider a flower, using 2- or 3-d image data (or additional dimensions or inputs, such as cell structures) seems pretty close to solved.
Abstraction is a sidetrack from that aspect of the problem. Humans seem to use abstraction for this, but that’s because we introspect our operation badly. Humans use abstraction to COMMUNICATE and (perhaps) EXTRAPOLATE beyond the observed classification instances. The abstraction is a very efficient compression, necessary because it’s impractical to communicate an entire embedding from one brain to another. And the compression may also be an important similarity mechanism to guess, for instance, what will happen if we eat one.
But abstraction is not fundamental to recognition—we have very deep neural networks for that, which we can’t really observe or understand on their own level.
The post already addresses multiple failure modes for 2d image recognition-style approaches, and I’ll throw in one more for for 3d: underlying ontology shifts. Everything in the OP holds just fine if the low-level physics simulator switches from classical to quantum. Try that with a CNN.
Also, the problem statement does not say train to recognize a flower. There is no training data, other than whatever boundary is originally drawn around the one flower.
Sure, but the argument applies for whatever sensory input humans are using—audio, touch, and smell are similar (at this level) to vision—they’re fairly deep neural networks with different layers performing different levels of classification. And the abstraction is a completely different function than the perception.
My main point is that introspectable abstraction is not fundamental to perception and distinguishing flower from not-flower. It _is_ important to communication and manipulation of the models, but your example of computing the 4D (including time as a dimension) bounding box for a specific flower is confusing different levels.
I think you’re confused about the problem. Recognizing a flower in an image is something solved by neural nets, yes. But the CNNs which solve that problem have no way to tell apart two flowers which look similar, absent additional training on that specific task—heck, they don’t even have a notion of what “different flowers” means other than flowers looking different in an image.
I totally agree that abstraction (introspectable or not) is not fundamental to perception. Yet humans pretty clearly have a notion of “flower” which involves more than just what-the-flower-looks-like. That is the rough notion which the OP is trying to reconstruct.
The “absent training on that specific task” matters a lot here, as does the precise limits of “specific”. There are commercially-available facial-identification (not just recognition) systems that are good enough for real-money payments, and they don’t train on the specific faces to be recognized. Likewise commercial movement-tracking systems that care about history and continuity, in addition to image similarity (so, for instance, twins don’t confuse them).
A flower-identification set of models would be easier than this, I expect.
This seems like a fairly small extension of a very well-studied problem in image recognition. Training a CNN to distinguish whatever your reference humans want to consider a flower, using 2- or 3-d image data (or additional dimensions or inputs, such as cell structures) seems pretty close to solved.
Abstraction is a sidetrack from that aspect of the problem. Humans seem to use abstraction for this, but that’s because we introspect our operation badly. Humans use abstraction to COMMUNICATE and (perhaps) EXTRAPOLATE beyond the observed classification instances. The abstraction is a very efficient compression, necessary because it’s impractical to communicate an entire embedding from one brain to another. And the compression may also be an important similarity mechanism to guess, for instance, what will happen if we eat one.
But abstraction is not fundamental to recognition—we have very deep neural networks for that, which we can’t really observe or understand on their own level.
The post already addresses multiple failure modes for 2d image recognition-style approaches, and I’ll throw in one more for for 3d: underlying ontology shifts. Everything in the OP holds just fine if the low-level physics simulator switches from classical to quantum. Try that with a CNN.
Also, the problem statement does not say train to recognize a flower. There is no training data, other than whatever boundary is originally drawn around the one flower.
Sure, but the argument applies for whatever sensory input humans are using—audio, touch, and smell are similar (at this level) to vision—they’re fairly deep neural networks with different layers performing different levels of classification. And the abstraction is a completely different function than the perception.
My main point is that introspectable abstraction is not fundamental to perception and distinguishing flower from not-flower. It _is_ important to communication and manipulation of the models, but your example of computing the 4D (including time as a dimension) bounding box for a specific flower is confusing different levels.
I think you’re confused about the problem. Recognizing a flower in an image is something solved by neural nets, yes. But the CNNs which solve that problem have no way to tell apart two flowers which look similar, absent additional training on that specific task—heck, they don’t even have a notion of what “different flowers” means other than flowers looking different in an image.
I totally agree that abstraction (introspectable or not) is not fundamental to perception. Yet humans pretty clearly have a notion of “flower” which involves more than just what-the-flower-looks-like. That is the rough notion which the OP is trying to reconstruct.
The “absent training on that specific task” matters a lot here, as does the precise limits of “specific”. There are commercially-available facial-identification (not just recognition) systems that are good enough for real-money payments, and they don’t train on the specific faces to be recognized. Likewise commercial movement-tracking systems that care about history and continuity, in addition to image similarity (so, for instance, twins don’t confuse them).
A flower-identification set of models would be easier than this, I expect.
> and (perhaps) EXTRAPOLATE beyond the observed classification instances.
seems to be rolling a lot under the rug. Sample complexity reduction is likely a big chunk of transfer learning.