how exactly does modulating the reward signal lead to rapid and flexible changes of behaviour?
On my current models, I would split it into three mechanisms:
The first involves interoceptive signals going from the hypothalamus / brainstem / etc. to the cortical mantle (neocortex, hippocampus, etc.), where at least part of it seems to be the same type of input data as any other sensory signal going to the cortical mantle (vision, etc.). Thus, you’re learning a predictive world-model that includes (among other things) predicting / explaining your current (and immediate-future) interoceptive state.
The second involves the “main” scalar RL system. We learn how interoceptive inputs are relevant to the rewarding-ness of different thoughts and actions in the same way that we learn how any other sensory input is relevant to the rewarding-ness of different thoughts and actions. This is closely related to so-called “incentive learning” in psychology. I have a half-written draft post on incentive learning, I can share it if you want, it doesn’t seem like I’m going to finish it anytime soon. :-P
The third involves what I call “visceral predictions”. The predictions are happening in part of the extended amygdala, part of the lateral septum, and part of the nucleus accumbens shell, more or less. These are probably hundreds of little trained models that learn predictive models of things like “Given what the cortex is doing right now, am I going to taste salt soon? Am I going to get goosebumps soon? Etc.” These signals go down to the brainstem, where they can and do feed into the reward function. I think this is at least vaguely akin to your reward basis paper. :)
On my current models, I would split it into three mechanisms:
The first involves interoceptive signals going from the hypothalamus / brainstem / etc. to the cortical mantle (neocortex, hippocampus, etc.), where at least part of it seems to be the same type of input data as any other sensory signal going to the cortical mantle (vision, etc.). Thus, you’re learning a predictive world-model that includes (among other things) predicting / explaining your current (and immediate-future) interoceptive state.
The second involves the “main” scalar RL system. We learn how interoceptive inputs are relevant to the rewarding-ness of different thoughts and actions in the same way that we learn how any other sensory input is relevant to the rewarding-ness of different thoughts and actions. This is closely related to so-called “incentive learning” in psychology. I have a half-written draft post on incentive learning, I can share it if you want, it doesn’t seem like I’m going to finish it anytime soon. :-P
The third involves what I call “visceral predictions”. The predictions are happening in part of the extended amygdala, part of the lateral septum, and part of the nucleus accumbens shell, more or less. These are probably hundreds of little trained models that learn predictive models of things like “Given what the cortex is doing right now, am I going to taste salt soon? Am I going to get goosebumps soon? Etc.” These signals go down to the brainstem, where they can and do feed into the reward function. I think this is at least vaguely akin to your reward basis paper. :)