I really liked this post. Not used to thinking about brain algorithms, but I believe I followed most of your points.
That being said, I’m not sure I get how your hypotheses explain the actual behavior of the rats. Just looking at hypothesis 3, you posit that thinking about salt gets an improved reward, and so does actions that make the rat expect salt-tasting. But that doesn’t remove the need for exploration! The neocortex still needs to choose a course of action before getting a reward. Actually, if thinking about salt is rewarded anyway, this might reinforce any behavior decided after thinking about salt. And if the interpretability is better and only rewards actions that are expected to result in tasting salt, there is still need for exploring to find such a plan and having it reinforced.
You’re right. “Thinking about salt is rewarded anyway” doesn’t make sense and isn’t right. You’re one of two people to call me out on it, and I just posted a long comment replying to the other here. Thank you!! I just added a correction to the article:
(UPDATE: Commenters point out that this description isn’t quite right—it doesn’t make sense to say that the idea of tasting salt is rewarding per se. Rather, when the rat starts expecting to taste salt, the subcortex sends a positive reward-prediction-error signal, and conversely if the rat stops expecting to taste salt, the subcortex sends a negative reward-prediction-error signal. Something like that. Sorry for the mistake / confusion. Thanks commenters!)
I really liked this post. Not used to thinking about brain algorithms, but I believe I followed most of your points.
That being said, I’m not sure I get how your hypotheses explain the actual behavior of the rats. Just looking at hypothesis 3, you posit that thinking about salt gets an improved reward, and so does actions that make the rat expect salt-tasting. But that doesn’t remove the need for exploration! The neocortex still needs to choose a course of action before getting a reward. Actually, if thinking about salt is rewarded anyway, this might reinforce any behavior decided after thinking about salt. And if the interpretability is better and only rewards actions that are expected to result in tasting salt, there is still need for exploring to find such a plan and having it reinforced.
Am I getting something wrong?
You’re right. “Thinking about salt is rewarded anyway” doesn’t make sense and isn’t right. You’re one of two people to call me out on it, and I just posted a long comment replying to the other here. Thank you!! I just added a correction to the article:
Does that answer your question?