In predictive processing, attention is a system that manipulates the confidence intervals on your predictions. Low attention → wide intervals → lots of mismatch between prediction and data doesn’t register as an error. High attention → tighter intervals → slight mismatch leads to error signal.
Hmm… that sounds a bit different from how I’ve understood attention in predictive processing to work. AFAIK, it’s not that attending to something tightens its confidence interval; it’s that things with tight confidence intervals (relevant for the task in question) get more attention.
So bottom-up attention would be computed by the subsystems that were broadcasting content into GNW, and their attention assignments would be implicit in their output. E.g. if you are looking at someone’s face and a subsystem judges that the important thing to pay attention to is the fact that they are looking angry, then it would send the message “this person looks angry” to the GNW. And subsystems would have a combination of learned and innate weights for when their messages could grab control of the GNW and dominate it with what they are paying attention to, similar to the salience cues in the basal ganglia that allow some bids to dominate in specific situations.
Top-down attention would be computed by attentional subsystems interfacing with the GNW, to pick out specific signals in the GNW that seemed most useful for the current task, and strengthening those signals.
The GNW seems like it can only broadcast “simple” or “small” things. A single image, a percept, a signal. Something like a hypothesis in the PP paradigm seems like too big and complex a thing to be “sent” on the GNW
Is it? Like, suppose that a subsystem’s hypothesis is that you are seeing a person’s face; as a result, the image of a face is sent to the GNW. In that case, the single image that is transmitted into the GNW is the hypothesis. That hypothesis being in consciousness may then trigger an error signal due to not matching another subsystem’s data, causing an alternative hypothesis to be broadcast into consciousness.
That said, seeing a face usually involves other things than just the sight of the face: thoughts about the person in question, their intentions, etc. My interpretation has been that once one subsystem has established that “this is a face” (and sends into consciousness a signal that highlights the facial features that it has computed to get the most attention), other subsystems then grab onto those features and send additional details and related information into consciousness. The overall hypothesis is formed by many distinct pieces of data submitted by different subsystems—e.g. “(1) I’m seeing a face, (2) which belongs to my friend Mary, (3) who seems to be happy; (4) I recall an earlier time when Mary was happy”.
Here’s an excerpt from Consciousness and the Brain that seems relevant:
In 1959, the artificial intelligence pioneer John Selfridge introduced another useful metaphor: the “pandemonium.” He envisioned the brain as a hierarchy of specialized “daemons,” each of which proposes a tentative interpretation of the incoming image. Thirty years of neurophysiological research, including the spectacular discovery of visual cells tuned to lines, colors, eyes, faces, and even U.S. presidents and Hollywood stars, have brought strong support to this idea. In Selfridge’s model, the daemons yelled their preferred interpretation at one another, in direct proportion to how well the incoming image favored their own interpretation. Waves of shouting were propagated through a hierarchy of increasingly abstract units, allowing neurons to respond to increasingly abstract features of the image—for instance, three daemons shouting for the presence of eyes, nose, and hair would together conspire to excite a fourth daemon coding for the presence of a face. By listening to the most vocal daemons, a decision system could form an opinion of the incoming image—a conscious percept.
Selfridge’s pandemonium model received one important improvement. Originally, it was organized according to a strict feed-forward hierarchy: the daemons bellowed only at their hierarchical superiors, but a high-ranking daemon never yelled back at a low-ranking one or even at another daemon of the same rank. In reality, however, neural systems do not merely report to their superiors; they also chat among themselves. The cortex is full of loops and bidirectional projections. Even individual neurons dialogue with each other: if neuron α projects to neuron β, then β probably projects back to α. At any level, interconnected neurons support each other, and those at the top of the hierarchy can talk back to their subordinates, so that messages propagate downward at least as much as upward.
Simulation and mathematical modeling of realistic “connectionist” models with many such loops show that they possess a very useful property. When a subset of neurons is excited, the entire group self-organizes into “attractor states”: groups of neurons form reproducible patterns of activity that remain stable for a long duration. As anticipated by Hebb, interconnected neurons tend to form stable cell assemblies.
As a coding scheme, these recurrent networks possess an additional advantage—they often converge to a consensus. In neuronal networks that are endowed with recurrent connections, unlike Selfridge’s daemons, the neurons do not simply yell stubbornly at one another: they progressively come to an intelligent agreement, a unified interpretation of the perceived scene. The neurons that receive the greatest amount of activation mutually support one another and progressively suppress any alternative interpretation. As a result, missing parts of the image can be restored and noisy bits can be removed. After several iterations, the neuronal representation encodes a cleaned-up, interpreted version of the perceived image. It also becomes more stable, resistant to noise, internally coherent, and distinct from other attractor states. Francis Crick and Christof Koch describe this representation as a winning “neural coalition” and suggest that it is the perfect vehicle for a conscious representation.
The term “coalition” points to another essential aspect of the conscious neuronal code: it must be tightly integrated. Each of our conscious moments coheres as one single piece. When contemplating Leonardo da Vinci’s Mona Lisa, we do not perceive a disemboweled Picasso with detached hands, Cheshire cat smile, and floating eyes. We retrieve all these sensory elements and many others (a name, a meaning, a connection to our memories of Leonardo’s genius)—and they are somehow bound together into a coherent whole. Yet each of them is initially processed by a distinct group of neurons, spread centimeters apart on the surface of the ventral visual cortex. How do they get attached to one another?
One solution is the formation of a global assembly, thanks to the hubs provided by the higher sectors of cortex. These hubs, which the neurologist Antonio Damasio calls “convergence zones,” are particularly predominant in the prefrontal cortex but also in other sectors of the anterior temporal lobe, inferior parietal lobe, and a midline region called the precuneus. All send and receive numerous projections to and from a broad variety of distant brain regions, allowing the neurons there to integrate information over space and time. Multiple sensory modules can therefore converge onto a single coherent interpretation (“a seductive Italian woman”). This global interpretation may, in turn, be broadcast back to the areas from which the sensory signals originally arose. The outcome is an integrated whole. Because of neurons with long-distance top-down axons, projecting back from the prefrontal cortex and its associated high-level network of areas onto the lower-level sensory areas, global broadcasting creates the conditions for the emergence of a single state of consciousness, at once differentiated and integrated.
This permanent back-and-forth communication is called “reentry” by the Nobel Prize winner Gerald Edelman. Model neuronal networks suggest that reentry allows for a sophisticated computation of the best possible statistical interpretation of the visual scene. Each group of neurons acts as an expert statistician, and multiple groups collaborate to explain the features of the input. For instance, a “shadow” expert decides that it can account for the dark zone of the image—but only if the light comes from the top left. A “lighting” expert agrees and, using this hypothesis, explains why the top parts of the objects are illuminated. A third expert then decides that, once these two effects are accounted for, the remaining image looks like a face. These exchanges continue until every bit of the image has received a tentative interpretation.
Hmm, yeah looks like I got PP attention backwards.
There’s two layers! Predictive circuits are sorta “autonomously” creating a focus within the domain of what they predict, and then the “global” or top-down attention can either be an attentional subsystem watching the GNW, or the distributed attentional-relevancy gate around the GNW.
The pandemonium stuff is also a great model. In another comment I mentioned that I’m fuzzy on how tightly or loosely coupled different subsytems can be, and how they are organized, and I was unintentionally imagining them as quite monolithic entities.
Hmm… that sounds a bit different from how I’ve understood attention in predictive processing to work. AFAIK, it’s not that attending to something tightens its confidence interval; it’s that things with tight confidence intervals (relevant for the task in question) get more attention.
So bottom-up attention would be computed by the subsystems that were broadcasting content into GNW, and their attention assignments would be implicit in their output. E.g. if you are looking at someone’s face and a subsystem judges that the important thing to pay attention to is the fact that they are looking angry, then it would send the message “this person looks angry” to the GNW. And subsystems would have a combination of learned and innate weights for when their messages could grab control of the GNW and dominate it with what they are paying attention to, similar to the salience cues in the basal ganglia that allow some bids to dominate in specific situations.
Top-down attention would be computed by attentional subsystems interfacing with the GNW, to pick out specific signals in the GNW that seemed most useful for the current task, and strengthening those signals.
Is it? Like, suppose that a subsystem’s hypothesis is that you are seeing a person’s face; as a result, the image of a face is sent to the GNW. In that case, the single image that is transmitted into the GNW is the hypothesis. That hypothesis being in consciousness may then trigger an error signal due to not matching another subsystem’s data, causing an alternative hypothesis to be broadcast into consciousness.
That said, seeing a face usually involves other things than just the sight of the face: thoughts about the person in question, their intentions, etc. My interpretation has been that once one subsystem has established that “this is a face” (and sends into consciousness a signal that highlights the facial features that it has computed to get the most attention), other subsystems then grab onto those features and send additional details and related information into consciousness. The overall hypothesis is formed by many distinct pieces of data submitted by different subsystems—e.g. “(1) I’m seeing a face, (2) which belongs to my friend Mary, (3) who seems to be happy; (4) I recall an earlier time when Mary was happy”.
Here’s an excerpt from Consciousness and the Brain that seems relevant:
Hmm, yeah looks like I got PP attention backwards.
There’s two layers! Predictive circuits are sorta “autonomously” creating a focus within the domain of what they predict, and then the “global” or top-down attention can either be an attentional subsystem watching the GNW, or the distributed attentional-relevancy gate around the GNW.
The pandemonium stuff is also a great model. In another comment I mentioned that I’m fuzzy on how tightly or loosely coupled different subsytems can be, and how they are organized, and I was unintentionally imagining them as quite monolithic entities.