The post already addresses multiple failure modes for 2d image recognition-style approaches, and I’ll throw in one more for for 3d: underlying ontology shifts. Everything in the OP holds just fine if the low-level physics simulator switches from classical to quantum. Try that with a CNN.
Also, the problem statement does not say train to recognize a flower. There is no training data, other than whatever boundary is originally drawn around the one flower.
Sure, but the argument applies for whatever sensory input humans are using—audio, touch, and smell are similar (at this level) to vision—they’re fairly deep neural networks with different layers performing different levels of classification. And the abstraction is a completely different function than the perception.
My main point is that introspectable abstraction is not fundamental to perception and distinguishing flower from not-flower. It _is_ important to communication and manipulation of the models, but your example of computing the 4D (including time as a dimension) bounding box for a specific flower is confusing different levels.
I think you’re confused about the problem. Recognizing a flower in an image is something solved by neural nets, yes. But the CNNs which solve that problem have no way to tell apart two flowers which look similar, absent additional training on that specific task—heck, they don’t even have a notion of what “different flowers” means other than flowers looking different in an image.
I totally agree that abstraction (introspectable or not) is not fundamental to perception. Yet humans pretty clearly have a notion of “flower” which involves more than just what-the-flower-looks-like. That is the rough notion which the OP is trying to reconstruct.
The “absent training on that specific task” matters a lot here, as does the precise limits of “specific”. There are commercially-available facial-identification (not just recognition) systems that are good enough for real-money payments, and they don’t train on the specific faces to be recognized. Likewise commercial movement-tracking systems that care about history and continuity, in addition to image similarity (so, for instance, twins don’t confuse them).
A flower-identification set of models would be easier than this, I expect.
The post already addresses multiple failure modes for 2d image recognition-style approaches, and I’ll throw in one more for for 3d: underlying ontology shifts. Everything in the OP holds just fine if the low-level physics simulator switches from classical to quantum. Try that with a CNN.
Also, the problem statement does not say train to recognize a flower. There is no training data, other than whatever boundary is originally drawn around the one flower.
Sure, but the argument applies for whatever sensory input humans are using—audio, touch, and smell are similar (at this level) to vision—they’re fairly deep neural networks with different layers performing different levels of classification. And the abstraction is a completely different function than the perception.
My main point is that introspectable abstraction is not fundamental to perception and distinguishing flower from not-flower. It _is_ important to communication and manipulation of the models, but your example of computing the 4D (including time as a dimension) bounding box for a specific flower is confusing different levels.
I think you’re confused about the problem. Recognizing a flower in an image is something solved by neural nets, yes. But the CNNs which solve that problem have no way to tell apart two flowers which look similar, absent additional training on that specific task—heck, they don’t even have a notion of what “different flowers” means other than flowers looking different in an image.
I totally agree that abstraction (introspectable or not) is not fundamental to perception. Yet humans pretty clearly have a notion of “flower” which involves more than just what-the-flower-looks-like. That is the rough notion which the OP is trying to reconstruct.
The “absent training on that specific task” matters a lot here, as does the precise limits of “specific”. There are commercially-available facial-identification (not just recognition) systems that are good enough for real-money payments, and they don’t train on the specific faces to be recognized. Likewise commercial movement-tracking systems that care about history and continuity, in addition to image similarity (so, for instance, twins don’t confuse them).
A flower-identification set of models would be easier than this, I expect.