(Now I’m trying to look at the wall of my room and to decide whether I actually do see pixels or ‘line segments’, which is an exercise that really puts a knot into my head.)
Sorry if I’m misunderstanding what you’re getting at but...
I don’t think there’s any point in which there are signals in your brain that correspond directly to something like pixels in a camera. Even in the retina, there’s supposedly predictive coding data compression going on (I haven’t looked into that in detail). By the time the signals are going to the neocortex, they’ve been split into three data streams carrying different types of distilled data: magnocellular, parvocellular, and koniocellular (actually several types of konio I think), if memory serves. There’s a theory I like about the information-processing roles of magno and parvo; nobody seems to have any idea what the konio information is doing and neither do I. :-P
But does it matter whether the signals are superficially the same or not? If you do a lossless transformation from pixels into edges (for example), who cares, the information is still there, right?
So then the question is, what information is in (say) V1 but is not represented in V2 or higher layers, and do we have conscious access to that information? V1 has so many cortical columns processing so much data, intuitively there has to be compression going on.
I haven’t really thought much about how information compression in the neocortex works per se. Dileep George & Jeff Hawkins say here that there’s something like compressed sensing happening, and Randall O’Reilly says here that there’s error-driven learning (something like gradient descent) making sure that the top-down predictions are close enough to the input. Close on what metric though? Probably not pixel-to-pixel differences … probably more like “close in whatever compressed-sensing representation space is created by the V1 columns”...?
Maybe a big part of the data compression is: we only attend to one object at a time, and everything else is lumped together into “background”. Like, you might think you’re paying close attention to both your hand and your pen, but actually you’re flipping back and forth, or else lumping the two together into a composite object! (I’m speculating.) Then the product space of every possible object in every possible arrangement in your field of view is broken into a dramatically smaller disjunctive space of possibilities, consisting of any one possible object in any one possible position. Now that you’ve thrown out 99.999999% of the information by only attending to one object at a time, there’s plenty of room for the GNW to have lots of detail about the object’s position, color, texture, motion, etc.
Sorry if I’m misunderstanding what you’re getting at but...
I don’t think there’s any point in which there are signals in your brain that correspond directly to something like pixels in a camera. Even in the retina, there’s supposedly predictive coding data compression going on (I haven’t looked into that in detail). By the time the signals are going to the neocortex, they’ve been split into three data streams carrying different types of distilled data: magnocellular, parvocellular, and koniocellular (actually several types of konio I think), if memory serves. There’s a theory I like about the information-processing roles of magno and parvo; nobody seems to have any idea what the konio information is doing and neither do I. :-P
But does it matter whether the signals are superficially the same or not? If you do a lossless transformation from pixels into edges (for example), who cares, the information is still there, right?
So then the question is, what information is in (say) V1 but is not represented in V2 or higher layers, and do we have conscious access to that information? V1 has so many cortical columns processing so much data, intuitively there has to be compression going on.
I haven’t really thought much about how information compression in the neocortex works per se. Dileep George & Jeff Hawkins say here that there’s something like compressed sensing happening, and Randall O’Reilly says here that there’s error-driven learning (something like gradient descent) making sure that the top-down predictions are close enough to the input. Close on what metric though? Probably not pixel-to-pixel differences … probably more like “close in whatever compressed-sensing representation space is created by the V1 columns”...?
Maybe a big part of the data compression is: we only attend to one object at a time, and everything else is lumped together into “background”. Like, you might think you’re paying close attention to both your hand and your pen, but actually you’re flipping back and forth, or else lumping the two together into a composite object! (I’m speculating.) Then the product space of every possible object in every possible arrangement in your field of view is broken into a dramatically smaller disjunctive space of possibilities, consisting of any one possible object in any one possible position. Now that you’ve thrown out 99.999999% of the information by only attending to one object at a time, there’s plenty of room for the GNW to have lots of detail about the object’s position, color, texture, motion, etc.
Not sure how helpful any of this is :-P