It would make a lot of sense to me if norepinephrine acted as a Q-like signal for negative rewards. I don’t have any neuroscience evidence for this, but it makes sense to me that negative rewards and positive rewards are very different for animals and would benefit from different approaches. I once ran some Q-learning experiments on the classic Taxi environment to see if I could make a satisficing agent (one that achieves a certain reward less than the maximum achievable and then rests). The agent responded by taking illegal actions that give highly negative rewards in the Taxi environment and hustling as hard as possible the rest of the time to achieve the reward specified. So I had to add a Q-function solely for negative rewards to get the desired behavior. Given that actual animals need to rest in a way that RL agents don’t have to in most environments, it makes sense to me that Q-learning on its own is not a good brain architecture.
Dopamine receptors in V1 kind of makes sense if you want to visually predict reward-like properties of objects in the environment. Like something could look tasty or not tasty, maybe.
That’s an interesting anecdote about the satisficing thing! I don’t think it quite applies to animals because I don’t think animals are maximizing the sum of future rewards (see here). Anyway the system is already set up with separate channels throughout for good things happening vs bad things happening (there’s a thing I haven’t written about but believe where the striatum sends out a cost and benefit estimate separately rather than just adding them up, and also in the “assessor” zone here there are different channels because good vs bad things have different autonomic consequences, e.g. sympathetic vs parasympathetic). Also this says norepinephrine is slow-acting, which suggests that it doesn’t implement a learning rule tied to particular thoughts and actions and events.
But the article says it does affect learning rate, and arousal and whatnot. So maybe something like: NE and acetylcholine both signal “important things are happening now, let’s increase learning rate, tune the dial towards fast reactions at the expense of energy efficiency, etc. etc.”, but acetylcholine is fast and local (“important things are happening at this particular part of the visual field right now”) and NE is is slow and global (“I am in a generally important situation”)? Dunno, just speculating based on one abstract.
Various thoughts:
It would make a lot of sense to me if norepinephrine acted as a Q-like signal for negative rewards. I don’t have any neuroscience evidence for this, but it makes sense to me that negative rewards and positive rewards are very different for animals and would benefit from different approaches. I once ran some Q-learning experiments on the classic Taxi environment to see if I could make a satisficing agent (one that achieves a certain reward less than the maximum achievable and then rests). The agent responded by taking illegal actions that give highly negative rewards in the Taxi environment and hustling as hard as possible the rest of the time to achieve the reward specified. So I had to add a Q-function solely for negative rewards to get the desired behavior. Given that actual animals need to rest in a way that RL agents don’t have to in most environments, it makes sense to me that Q-learning on its own is not a good brain architecture.
Dopamine receptors in V1 kind of makes sense if you want to visually predict reward-like properties of objects in the environment. Like something could look tasty or not tasty, maybe.
That’s an interesting anecdote about the satisficing thing! I don’t think it quite applies to animals because I don’t think animals are maximizing the sum of future rewards (see here). Anyway the system is already set up with separate channels throughout for good things happening vs bad things happening (there’s a thing I haven’t written about but believe where the striatum sends out a cost and benefit estimate separately rather than just adding them up, and also in the “assessor” zone here there are different channels because good vs bad things have different autonomic consequences, e.g. sympathetic vs parasympathetic). Also this says norepinephrine is slow-acting, which suggests that it doesn’t implement a learning rule tied to particular thoughts and actions and events.
But the article says it does affect learning rate, and arousal and whatnot. So maybe something like: NE and acetylcholine both signal “important things are happening now, let’s increase learning rate, tune the dial towards fast reactions at the expense of energy efficiency, etc. etc.”, but acetylcholine is fast and local (“important things are happening at this particular part of the visual field right now”) and NE is is slow and global (“I am in a generally important situation”)? Dunno, just speculating based on one abstract.