Seth Herd comments on Hedonic Loops and Taming RL

Seth Herd 20 Jul 2023 15:18 UTC
4 points
0
The above is describing the model-free component of learning reward-function dependent policies. The Morrison and Berridge salt experiment is demonstrating the model-based side, which probably comes from imagining specific outcomes and how they’d feel.
- beren 21 Jul 2023 15:14 UTC
  2 points
  0
  Parent
  This is where I disagree! I don’t think the Morrison and Berridge experiment demonstrates model-based side. It is consistent with model-based RL but is also consistent with model-free algorithms that can flexibly adapt to changing reward functions such as linear RL. Personally, I think this latter is more likely since it is such a low level response which can be modulated entirely by subcortical systems and so seems unlikely to require model-based planning to work