I’m biased: I have a strong prior belief that reinforcement learning should not be involved in sensory processing in the brain. The reason is simple: avoiding wishful thinking.
If there’s a positive reward for looking at beautiful ocean scenes, for example, then the RL solution would converge towards parsing whatever you look at as a beautiful ocean scene, whether it actually is or not!
That seems like a strong argument for why RL should not be the only mechanism for sensory processing, but a weaker one for why it shouldn’t be involved at all?
Without looking at the papers you cited, one reason for one might expect RL to be involved in sensory processing would be the connection between perception and skill learning. Some of the literature on expertise that I’ve seen suggests that the development of a skill involves literally learning to see differently, in which case reinforcement learning should be associated with sensory processes, shaping one’s perception as one practices a skill so that one starts to see things as an expert would.
In this project, we studied the way nurses could tell when a very premature infant was developing a life-threatening infection. Beth Crandall, one of my coworkers, had gotten funding from the National Institutes of Health to study decision making and expertise in nurses. She arranged to work with the nurses in the neonatal intensive care unit (NICU) of a large hospital. These nurses cared for newly born infants who were premature or otherwise at risk.
Beth found that one of the difficult decisions the nurses had to make was to judge when a baby was developing a septic condition-in other words, an infection. These infants weighed only a few pounds-some of them, the microbabies, less than two pounds. When babies this small develop an infection, it can spread through their entire body and kill them before the antibiotics can stop it. Noticing the sepsis as quickly as possible is vital.
Somehow the nurses in the NICU could do this. They could look at a baby, even a microbaby, and tell the physician when it was time to start the antibiotic (Crandall and Getchell-Reiter 1993). Sometimes the hospital would do tests, and they would come back negative. Nevertheless, the baby went on antibiotics, and usually the next day the test would come back positive.
This is the type of skilled decision making that interests us the most. Beth began by asking the nurses how they were able to make these judgments. ments. “It’s intuition,” she was told, or else “through experience.” And that was that. The nurses had nothing more to say about it. They looked. They knew. End of story. That was even more interesting: expertise that the person clearly has but cannot describe. Beth geared up the methods we had used with the firefighters. Instead of asking the nurses general questions, such as, “How do you make this judgment?” she probed them on difficult cases where they had to use the judgment skills. She interviewed nurses one at a time and asked each to relate a specific case where she had noticed an infant developing sepsis. The nurses could recall incidents, and in each case they could remember the details of what had caught their attention. The cues varied from one case to the next, and each nurse had experienced experienced a limited number of incidents. Beth compiled a master list of sepsis cues and patterns of cues in infants and validated it with specialists in neonatology.
Some of the cues were the same as those in the medical literature, but almost half were new, and some cues were the opposite of sepsis cues in adults. For instance, adults with infections tend to become more irritable. Premature babies, however, become less irritable. If a microbaby cried every time it was lifted up to be weighed and then one day it did not cry, that would be a danger signal to the experienced nurse. Moreover, the nurses were not relying on any single cue. They often reacted to a pattern of cues, each one subtle, that together signaled an infant in distress. [...]
In her project with the neonatal intensive care unit, Beth Crandall had asked the experienced nurses how they detected the early signs of sepsis. They told her it was experience and intuition. They did not know what they knew, because what they knew was perceptual-how to see. The only way Beth was going to find out anything useful was to have the nurses tell their stories of specific instances, each tied to a different set of perceptual cues. At the end of the interviews, Beth could draw all the stories together and compile a master list of cues to sepsis.
Because most of life is routine, and most objects and situations are mostly familiar, and because we’ve practiced our visual skills from early childhood, they suffice for most tasks, and go unnoticed. Needing to learn new visual skills is unusual for adults.
Some video games are an outstanding exception. Video games are designed to make learning new skills fun, and many games teach you to see in new ways. When you enter a new segment of the game, everything is happening much too fast; you have no idea where to look or what it means. Enemies come out of nowhere and kill you before you even see what they are. With practice, you learn to see things you couldn’t before, because you didn’t know how.
You are sneaking along a gloomy passageway in the necromancer’s tower. Suddenly you die. WTF just happened??
You reload the game from the last save point.
You are sneaking along that passageway and out of the corner of your eye you see something violent happen on left side of the screen and then you die. You reload.
You are sneaking along, looking left, and a golem leaps out of the archway on your left side and kills you.
This time, you’re watching the archway cautiously, and when the golem leaps out you hit it with a lightning bolt. A moment later, something happens on the right and you die.
Next try, you zap the golem with a lightning bolt, you flick your eyes right, and as the tomb over there opens, you manage to get one of the zombies with a fireball. But another one kills you. You noticed that the headless zombie hesitated for a moment before attacking.
You zap the golem and incinerate the one-arm zombie with a fireball while the headless one gropes around. You cartwheel to dodge its attack, and finish it off with a mid-air flying dagger thrust. Awesomeness! Unfortunately the floorboards you land on are rotten and you fall through to your death.
… An hour later, you stroll through the tower, knocking off monsters and skipping traps without really thinking about it, because you have learned to perceive the meanings of routine necromantic phenomena. Archways harbor golems, zombies without heads can’t see, rotten floorboards are a bit darker than solid ones. Now you know what to look out for, and where to look to see it.
You whack the necromancer, collect the Arcane Eggbeater Of Destiny, and return with it to the College of Wizards to get your next homework assignment. [...]
What we’ve learned from visual psychology suggests that seeing involves learned, task-specific skills. It is contextual and purposive, which makes it a good fit for everyday, reasonable, routine activity. (And not such a great fit for objective rationality.)
Visual activity is not separate from the rest of what we’re doing. The phrase “hand-eye coordination” points at this. In video games, your visual actions that tell your lower-level visual processing systems what to do are just as much part of the skill of fighting a group of monsters as swinging your sword is. Shifts of visual attention, for instance, are seamlessly integrated with the rest of the killing dance. As a more mundane example, if you are looking for scissors, you’ll move your head as well as your eyes to check around the desk, shove clutter out of the way to see behind or beneath it, and eventually get up and go open a drawer and peer inside. Visual activity and bodily motions are entangled. [...]
Much of what you see, you see as something. You don’t see a textured black region of the visual image, you see a loudspeaker. Or a raven. It’s already a loudspeaker or a raven when you first experience it.7 Bottom-up vision has done that work for you.
What you see something as depends on your knowledge, context, and purposes. If you are familiar with moussaka and you see it on a plate in a restaurant, you’ll probably see it as moussaka. If you aren’t, you’ll probably see it as a mushy casserole. You can’t see it as moussaka, because that’s not part of your ontology. If you see moussaka on a city sidewalk, you might just see it as disgusting, potentially pathogenic slop that you want to avoid. What you see a clump of atoms as depends on what you are looking out for, and why. Although bottom-up processes can do much of the work for you, especially in the case of rigid manufactured objects like loudspeakers with standard shapes and colors, your top-down direction often also plays a critical role.
Let’s say that I spend fifteen years studying chess. [...] We will start with day one. The first thing I have to do is to internalize how the pieces move. I have to learn their values. I have to learn how to coordinate them with one another. Early on, these steps might seem complex. There is the pawn, the knight, the bishop, the rook, the queen, and the king. Each piece is unique, with its own strengths and weaknesses. Each time I look at a chess piece I have to remember what it is and how it moves. Then I look at the next piece and try to remember how that one moves. There are initially thirty-two pieces on a chessboard. To make a responsible chess decision, I have to look at all those pieces and check for captures, quick attacks, and other obvious possibilities. By the time I get to the third piece, I’m already a bit overwhelmed. By the tenth piece I have a headache, have already forgotten what I discovered about the first nine pieces, and my opponent is bored. At this point I will probably just make a move and blunder.
So let’s say that now, instead of launching from the standard starting position, we begin on an empty board with just a king and a pawn against a king. These are relatively simple pieces. I learn how they both move, and then I play around with them for a while until I feel comfortable. Then, over time, I learn about bishops in isolation, then knights, rooks, and queens. Soon enough, the movements and values of the chess pieces are natural to me. I don’t have to think about them consciously, but see their potential simultaneously with the figurine itself. Chess pieces stop being hunks of wood or plastic, and begin to take on an energetic dimension. Where the piece currently sits on a chessboard pales in comparison to the countless vectors of potential flying off in the mind. I see how each piece affects those around it. Because the basic movements are natural to me, I can take in more information and have a broader perspective of the board. Now when I look at a chess position, I can see all the pieces at once. The network is coming together.
Next I have to learn the principles of coordinating the pieces. I learn how to place my arsenal most efficiently on the chessboard and I learn to read the road signs that determine how to maximize a given soldier’s effectiveness in a particular setting. These road signs are principles. Just as I initially had to think about each chess piece individually, now I have to plod through the principles in my brain to figure out which apply to the current position and how. Over time, that process becomes increasingly natural to me, until I eventually see the pieces and the appropriate principles in a blink. While an intermediate player will learn how a bishop’s strength in the middlegame depends on the central pawn structure, a slightly more advanced player will just flash his or her mind across the board and take in the bishop and the critical structural components. The structure and the bishop are one. Neither has any intrinsic value outside of its relation to the other, and they are chunked together in the mind.
This new integration of knowledge has a peculiar effect, because I begin to realize that the initial maxims of piece value are far from ironclad. The pieces gradually lose absolute identity. I learn that rooks and bishops work more efficiently together than rooks and knights, but queens and knights tend to have an edge over queens and bishops. Each piece’s power is purely relational, depending upon such variables as pawn structure and surrounding forces. So now when you look at a knight, you see its potential in the context of the bishop a few squares away. Over time each chess principle loses rigidity, and you get better and better at reading the subtle signs of qualitative relativity. Soon enough, learning becomes unlearning. The stronger chess player is often the one who is less attached to a dogmatic interpretation of the principles. This leads to a whole new layer of principles—those that consist of the exceptions to the initial principles. Of course the next step is for those counterintuitive signs to become internalized just as the initial movements of the pieces were. The network of my chess knowledge now involves principles, patterns, and chunks of information, accessed through a whole new set of navigational principles, patterns, and chunks of information, which are soon followed by another set of principles and chunks designed to assist in the interpretation of the last. Learning chess at this level becomes sitting with paradox, being at peace with and navigating the tension of competing truths, letting go of any notion of solidity. [...]
Most people would be surprised to discover that if you compare the thought process of a Grandmaster to that of an expert (a much weaker, but quite competent chess player), you will often find that the Grandmaster consciously looks at less, not more. That said, the chunks of information that have been put together in his mind allow him to see much more with much less conscious thought. So he is looking at very little and seeing quite a lot. This is the critical idea.I
The frontal lobe does involve RL, and this is used to think high-value thoughts and take high-value actions;
One reason that thoughts / actions can be high-value is by acquiring valuable information, and one way they can do this is by directing saccades and attention towards parts of the visual field (or other sensory input) where valuable information is at;
That corresponding sensory input processing area is still doing predictive learning, but it uses a higher learning rate when it is the focus of top-down attention, and therefore tends to develop a rich pattern-recognizing vocabulary that is lopsidedly tailored towards recognizing the types of patterns that carry valuable information from the perspective of the RL-based frontal lobe.
So RL is involved, just a step removed. (Maybe my post title was bad. :-P ) Do you think that’s an adequate involvement of RL to explain those and other examples?
Maybe. :) I don’t have much of a position on “which part of the brain is the sensory processing-related reinforcement learning implemented in”, just on the original claim of “we shouldn’t expect to find RL involved in sensory processing”.
That seems like a strong argument for why RL should not be the only mechanism for sensory processing, but a weaker one for why it shouldn’t be involved at all?
Without looking at the papers you cited, one reason for one might expect RL to be involved in sensory processing would be the connection between perception and skill learning. Some of the literature on expertise that I’ve seen suggests that the development of a skill involves literally learning to see differently, in which case reinforcement learning should be associated with sensory processes, shaping one’s perception as one practices a skill so that one starts to see things as an expert would.
E.g. Gary Klein:
and David Chapman:
and Josh Waitzkin:
Thanks! My current model is that
The frontal lobe does involve RL, and this is used to think high-value thoughts and take high-value actions;
One reason that thoughts / actions can be high-value is by acquiring valuable information, and one way they can do this is by directing saccades and attention towards parts of the visual field (or other sensory input) where valuable information is at;
That corresponding sensory input processing area is still doing predictive learning, but it uses a higher learning rate when it is the focus of top-down attention, and therefore tends to develop a rich pattern-recognizing vocabulary that is lopsidedly tailored towards recognizing the types of patterns that carry valuable information from the perspective of the RL-based frontal lobe.
So RL is involved, just a step removed. (Maybe my post title was bad. :-P ) Do you think that’s an adequate involvement of RL to explain those and other examples?
Maybe. :) I don’t have much of a position on “which part of the brain is the sensory processing-related reinforcement learning implemented in”, just on the original claim of “we shouldn’t expect to find RL involved in sensory processing”.
That’s fair; the first sentence now has a caveat to that effect. Thanks again!