The above is describing the model-free component of learning reward-function dependent policies. The Morrison and Berridge salt experiment is demonstrating the model-based side, which probably comes from imagining specific outcomes and how they’d feel.
This is where I disagree! I don’t think the Morrison and Berridge experiment demonstrates model-based side. It is consistent with model-based RL but is also consistent with model-free algorithms that can flexibly adapt to changing reward functions such as linear RL. Personally, I think this latter is more likely since it is such a low level response which can be modulated entirely by subcortical systems and so seems unlikely to require model-based planning to work
The above is describing the model-free component of learning reward-function dependent policies. The Morrison and Berridge salt experiment is demonstrating the model-based side, which probably comes from imagining specific outcomes and how they’d feel.
This is where I disagree! I don’t think the Morrison and Berridge experiment demonstrates model-based side. It is consistent with model-based RL but is also consistent with model-free algorithms that can flexibly adapt to changing reward functions such as linear RL. Personally, I think this latter is more likely since it is such a low level response which can be modulated entirely by subcortical systems and so seems unlikely to require model-based planning to work