A lot of good discussion here. I especially think Steven Byrnes makes some good points (as usual). I agree that it’s worth being distinct, in a technical jargon sort of way, about the differences between ‘values’ as used to refer to RL and ‘values’ as used by humans to talk about a particular type of reflectively endorsed desire for the shape of the current or future world. I think it is much more accurate to map RL ‘values’ to human ‘desires’, and that a term which encompasses this concept well is ‘reward’. So my recommendation would be to stick to talking about ‘reward signals’ in the human brain and in RL.
In regards to an assertion about the brain made in the post, I would like to add a couple details.
As a newborn baby, you start life with subcortical hardwired reinforcement circuitry (fixed by your genome) and a randomly initialized neocortex.
This is roughly true, but I think it’s important for this conversation to add some details about the constraints on the randomness of the cortical initialization.
Constrained cortical randomness: in certain ways this randomness is extremely constrained (and breaking these constraints leads to a highly dysfunctional brain). Of particular importance, the long range (multiple centimeters) inter-module connections are genetically hard-coded and established in fetal development. These long range connections are then able to change only locally (usually << 0.5 cm) in terms of their input/output locations. No new long range connections can be formed for the rest of the lifespan. They can be lost but not replaced. There are also rules of about which subtypes of cells in which layers of the cortex can connect to which other subtypes of cells. So there’s a lot of important order there. I think when making comparison to machine learning we should be careful to keep in mind that the brain’s plasticity is far more constrained than a default neural network. This means particularly that various modules within the neocortex take on highly constrained roles which are very similar between most humans (barring dramatic damage or defects). This is quite useful for interpreting the function of parts of the cortex.
The subcortical reward circuitry has some hardcoded aspects and some randomly initialized learning aspects. Less so than the cortex, but not none. Also, the reward circuitry (particular the reciprocal connections between the thalamus and the cortex and back and forth again, and the amygdala—prefrontal cortex links) have a lot to do with learned rewards and temporary task-specific connections between world-state-prediction and reward. For instance, getting really emotionally excited about earning points in a video game is something that is dependent on the function of the prefrontal cortex interpreting those points as a task-relevant signal (video game playing being the self-assigned task). This ability to contextually self-assign tasks and associate arbitrary sensory inputs as temporary reward signals related to these tasks is a key part of what makes humans agentic. Classically, the symptom cluster which results from damage to the prefrontal areas responsible self-task-and-reward-assignment area is called ‘lobotomized’. Lobotomy was a primitive surgery involving deliberately damaging these frontal cortical regions of misbehaving mentally-ill people specifically to permanently remove their agency. Studies on people who have sustained deliberate or accidental damage to this region show that they can no longer successfully make and execute multi-step or abstract tasks for themselves. They can manage single step tasks such as eating food placed in front of them when they are hungry, but not actively seeking out food that isn’t obviously available. This varies depending on the amount of damage to this area. Some patients may be able to go to the refrigerator and get food if hungry. An only slightly damaged patient may be able to go grocery shopping even, but probably wouldn’t be able to connect up a longer and more delayed / abstracted task chain around preventing hunger. For instance, earning money to use for grocery shopping or saving shelf-stable food for future instances of temporary unavailability of the grocery store.
Ok, after thinking a bit, I just can’t resist throwing in another relevant point.
Attention.
Attention in the brain is entirely inhibitory (activation magnitude reduction) of the cortical areas currently judged to be irrelevant. This inhibition is not absolute, it can be overcome by a sufficiently strong surprising signal. When it isn’t being overridden, it drives the learning rate to basically zero for the unattended areas. This has been most studied (for convenience reasons) in the visual cortex, in the context of suppressing visual information currently deemed irrelevant to the tasks at hand. These tasks at hand involve both semi-hardwired instincts like predator or prey detection, and also conditional task-specific attention as mediated by the frontal cortex (with information passed via those precious long-range connections, many of which route through the thalamus).
A lot of good discussion here. I especially think Steven Byrnes makes some good points (as usual). I agree that it’s worth being distinct, in a technical jargon sort of way, about the differences between ‘values’ as used to refer to RL and ‘values’ as used by humans to talk about a particular type of reflectively endorsed desire for the shape of the current or future world. I think it is much more accurate to map RL ‘values’ to human ‘desires’, and that a term which encompasses this concept well is ‘reward’. So my recommendation would be to stick to talking about ‘reward signals’ in the human brain and in RL.
In regards to an assertion about the brain made in the post, I would like to add a couple details.
This is roughly true, but I think it’s important for this conversation to add some details about the constraints on the randomness of the cortical initialization.
Constrained cortical randomness: in certain ways this randomness is extremely constrained (and breaking these constraints leads to a highly dysfunctional brain). Of particular importance, the long range (multiple centimeters) inter-module connections are genetically hard-coded and established in fetal development. These long range connections are then able to change only locally (usually << 0.5 cm) in terms of their input/output locations. No new long range connections can be formed for the rest of the lifespan. They can be lost but not replaced. There are also rules of about which subtypes of cells in which layers of the cortex can connect to which other subtypes of cells. So there’s a lot of important order there. I think when making comparison to machine learning we should be careful to keep in mind that the brain’s plasticity is far more constrained than a default neural network. This means particularly that various modules within the neocortex take on highly constrained roles which are very similar between most humans (barring dramatic damage or defects). This is quite useful for interpreting the function of parts of the cortex.
The subcortical reward circuitry has some hardcoded aspects and some randomly initialized learning aspects. Less so than the cortex, but not none. Also, the reward circuitry (particular the reciprocal connections between the thalamus and the cortex and back and forth again, and the amygdala—prefrontal cortex links) have a lot to do with learned rewards and temporary task-specific connections between world-state-prediction and reward. For instance, getting really emotionally excited about earning points in a video game is something that is dependent on the function of the prefrontal cortex interpreting those points as a task-relevant signal (video game playing being the self-assigned task). This ability to contextually self-assign tasks and associate arbitrary sensory inputs as temporary reward signals related to these tasks is a key part of what makes humans agentic. Classically, the symptom cluster which results from damage to the prefrontal areas responsible self-task-and-reward-assignment area is called ‘lobotomized’. Lobotomy was a primitive surgery involving deliberately damaging these frontal cortical regions of misbehaving mentally-ill people specifically to permanently remove their agency. Studies on people who have sustained deliberate or accidental damage to this region show that they can no longer successfully make and execute multi-step or abstract tasks for themselves. They can manage single step tasks such as eating food placed in front of them when they are hungry, but not actively seeking out food that isn’t obviously available. This varies depending on the amount of damage to this area. Some patients may be able to go to the refrigerator and get food if hungry. An only slightly damaged patient may be able to go grocery shopping even, but probably wouldn’t be able to connect up a longer and more delayed / abstracted task chain around preventing hunger. For instance, earning money to use for grocery shopping or saving shelf-stable food for future instances of temporary unavailability of the grocery store.
Ok, after thinking a bit, I just can’t resist throwing in another relevant point.
Attention.
Attention in the brain is entirely inhibitory (activation magnitude reduction) of the cortical areas currently judged to be irrelevant. This inhibition is not absolute, it can be overcome by a sufficiently strong surprising signal. When it isn’t being overridden, it drives the learning rate to basically zero for the unattended areas. This has been most studied (for convenience reasons) in the visual cortex, in the context of suppressing visual information currently deemed irrelevant to the tasks at hand. These tasks at hand involve both semi-hardwired instincts like predator or prey detection, and also conditional task-specific attention as mediated by the frontal cortex (with information passed via those precious long-range connections, many of which route through the thalamus).