Awesome post! I happen to also have tried to distill links between RPE and phasic dopamine in the “Prefrontal Cortex as a Meta-RL System” of this blog.
In particular I reference this paper on DL in the brain & this other one for RL in the brain. Also, I feel like the part 3 about links between RL and neuro of the RL book is a great resource for this.
If you Ctrl-F the post you’ll find my little paragraph on how my take differs from Marblestone, Wayne, Kording 2016.
I haven’t found “meta-RL” to be a helpful way to frame either the bandit thing or the follow-up paper relating it to the brain, more-or-less for reasons here, i.e. that the normal RL / POMDP expectation is that actions have to depend on previous observations—like think of playing an Atari game—and I guess we can call that “learning”, but then we have to say that a large fraction of every RL paper ever is actually a meta-RL paper, and more importantly I just don’t find that thinking in those terms leads me to a better understanding of anything, but whatever, YMMV.
I don’t agree with everything in the RL book chapter but it’s still interesting, thanks for the link.
Right I just googled Marblestone and so you’re approaching it with the dopamine side and not the acetylcholine. Without debating about words, their neuroscience paper is still at least trying to model the phasic dopamine signal as some RPE & the prefrontal network as an LSTM (IIRC), which is not acetylcholine based. I haven’t read in detail this post & the one linked, I’ll comment again when I do, thanks!
Awesome post! I happen to also have tried to distill links between RPE and phasic dopamine in the “Prefrontal Cortex as a Meta-RL System” of this blog.
In particular I reference this paper on DL in the brain & this other one for RL in the brain. Also, I feel like the part 3 about links between RL and neuro of the RL book is a great resource for this.
Thanks!
If you Ctrl-F the post you’ll find my little paragraph on how my take differs from Marblestone, Wayne, Kording 2016.
I haven’t found “meta-RL” to be a helpful way to frame either the bandit thing or the follow-up paper relating it to the brain, more-or-less for reasons here, i.e. that the normal RL / POMDP expectation is that actions have to depend on previous observations—like think of playing an Atari game—and I guess we can call that “learning”, but then we have to say that a large fraction of every RL paper ever is actually a meta-RL paper, and more importantly I just don’t find that thinking in those terms leads me to a better understanding of anything, but whatever, YMMV.
I don’t agree with everything in the RL book chapter but it’s still interesting, thanks for the link.
Right I just googled Marblestone and so you’re approaching it with the dopamine side and not the acetylcholine. Without debating about words, their neuroscience paper is still at least trying to model the phasic dopamine signal as some RPE & the prefrontal network as an LSTM (IIRC), which is not acetylcholine based. I haven’t read in detail this post & the one linked, I’ll comment again when I do, thanks!