Anyway, I would say that the word “I” is generally referring to the goings-on in the global workspace circuits in the brain, which we can think of as hierarchically above the visual system. The workspace can query the visual system, basically by sending a suite of top-down constraints into the visual system PGM (“there’s definitely a vertical line here!” or whatever), allowing the visual system to do its probabilistic inference, and then branching based on the status of some other visual system variable(s).
Why is a query represented as an overconfident false belief?
How would you query low-level details from a high-level node? Don’t the hierarchically high-up nodes represent things which range over longer distances in space/time, eliding low-level details like lines?
How would you query low-level details from a high-level node? Don’t the hierarchically high-up nodes represent things which range over longer distances in space/time, eliding low-level details like lines?
My explanation would be: it’s not a strict hierarchy, there are plenty of connections from the top to the bottom (or at least near-bottom). “Feedforward and feedback projections between regions typically connect to multiple levels of the hierarchy” “It has been estimated that 40% of all possible region-to-region connections actually exist which is much larger than a pure hierarchy would suggest.” (ref) (I’ve heard it elsewhere too.) Also, we need to do compression (throw out information) to get from raw input to top-level, but I think a lot of that compression is accomplished by only attending to one “object” at a time, rapidly flitting from one to another. I’m not sure how far that gets you, but at least it’s part of the story I think, in that it reduces the need to throw out low-level details. Another thing is saccades: maybe you can’t make high-level predictions about literally every cortical column in V1, but if you can access a subset of columns, then saccades can fill in the gaps.
Why is a query represented as an overconfident false belief?
I have pretty high confidence that “visual imagination” is accessing the same world-model database and machinery as “parsing a visual scene” (and likewise “imagining a sound” vs “parsing a sound”, etc.) I find it hard to imagine any alternative to that. Like it doesn’t seem plausible that we have two copies of this giant data structure and machinery and somehow keep them synchronized. And introspectively, it does seem to be true that there’s some competition where it’s hard to simultaneously imagine a sound while processing incoming sounds etc.—I mean, it’s always hard to do two things at once, but this seems especially hard.
So then the question is: how can you imagine seeing something that isn’t there, without the imagination being overruled by bottom-up sensory input? I guess there has to be some kind of mechanism that allows this, like a mechanism by which top-level processing can choose to prevent (a subset of) sensory input from having its usual strong influence on (a subset of) the network. I don’t know what that mechanism is.
I have pretty high confidence that “visual imagination” is accessing the same world-model database and machinery as “parsing a visual scene” (and likewise “imagining a sound” vs “parsing a sound”, etc.)
Update: Oops! I just learned that what I said there is kinda wrong.
What I should have said was: the machinery / database used for “visual imagination” is a subset of the machinery / database used for “parsing a visual scene”.
…But it’s a strict subset. Low-level visual processing is all about taking the massive flood of incoming retinal data and distilling it into a more manageable subspace of patterns, and that low-level machinery is not useful for visual imagination. See: visual mental imagery engages the left fusiform gyrus, but not the [occipital lobe].
(To be clear, the occipital lobe is not involved at inference time. The occipital lobe is obviously involved when the left fusiform gyrus is first learning its vocabulary of visual patterns.)
I don’t think that affects anything else in the conversation, just wanted to set the record straight. :)
Why is a query represented as an overconfident false belief?
How would you query low-level details from a high-level node? Don’t the hierarchically high-up nodes represent things which range over longer distances in space/time, eliding low-level details like lines?
My explanation would be: it’s not a strict hierarchy, there are plenty of connections from the top to the bottom (or at least near-bottom). “Feedforward and feedback projections between regions typically connect to multiple levels of the hierarchy” “It has been estimated that 40% of all possible region-to-region connections actually exist which is much larger than a pure hierarchy would suggest.” (ref) (I’ve heard it elsewhere too.) Also, we need to do compression (throw out information) to get from raw input to top-level, but I think a lot of that compression is accomplished by only attending to one “object” at a time, rapidly flitting from one to another. I’m not sure how far that gets you, but at least it’s part of the story I think, in that it reduces the need to throw out low-level details. Another thing is saccades: maybe you can’t make high-level predictions about literally every cortical column in V1, but if you can access a subset of columns, then saccades can fill in the gaps.
I have pretty high confidence that “visual imagination” is accessing the same world-model database and machinery as “parsing a visual scene” (and likewise “imagining a sound” vs “parsing a sound”, etc.) I find it hard to imagine any alternative to that. Like it doesn’t seem plausible that we have two copies of this giant data structure and machinery and somehow keep them synchronized. And introspectively, it does seem to be true that there’s some competition where it’s hard to simultaneously imagine a sound while processing incoming sounds etc.—I mean, it’s always hard to do two things at once, but this seems especially hard.
So then the question is: how can you imagine seeing something that isn’t there, without the imagination being overruled by bottom-up sensory input? I guess there has to be some kind of mechanism that allows this, like a mechanism by which top-level processing can choose to prevent (a subset of) sensory input from having its usual strong influence on (a subset of) the network. I don’t know what that mechanism is.
Update: Oops! I just learned that what I said there is kinda wrong.
What I should have said was: the machinery / database used for “visual imagination” is a subset of the machinery / database used for “parsing a visual scene”.
…But it’s a strict subset. Low-level visual processing is all about taking the massive flood of incoming retinal data and distilling it into a more manageable subspace of patterns, and that low-level machinery is not useful for visual imagination. See: visual mental imagery engages the left fusiform gyrus, but not the [occipital lobe].
(To be clear, the occipital lobe is not involved at inference time. The occipital lobe is obviously involved when the left fusiform gyrus is first learning its vocabulary of visual patterns.)
I don’t think that affects anything else in the conversation, just wanted to set the record straight. :)