Steven Byrnes comments on Inner Alignment in Salt-Starved Rats

Steven Byrnes 24 Nov 2020 21:46 UTC
4 points
0
Strong agree that I have lots of detailed thoughts about the neocortex’s algorithms and am probably implicitly leaning on them in ways that I’m not entire aware of and not communicating well. I appreciate your working with me. :-)
I do want to walk back a bit about the reward prediction error stuff. I think the following is equivalent but simpler:
I propose that the subcortex sends a reward related to the time-derivative of how strongly the neocortex is imagining / expecting to taste salt. So the neocortex gets a reward for first entertaining the idea of tasting salt, and another incremental reward for growing that idea into a definite plan. But then it would get a negative reward for dropping that idea.
(I think this is maybe related to the Russell-Ng potential-based reward shaping thing.)
the neocortex is constantly incentivized to fool the basal ganglia into predicting higher rewards
Well, there’s a couple things, I think.
First, the neocortex can’t just expect arbitrary things. It’s constrained by self-supervised learning, which throws out models that have, in the past, made predictions refuted by experience. Like, let’s say that every time you open the door, the handle makes a click. You’re going to start expecting the click to happen. You have no choice, you can’t not expect it! There are also constraints around self-consistency and other things, like you can’t visualize something that is simultaneously stationary and dancing; those two models are just inconsistent, and the message-passing algorithm will simply not allow both to be active at the same time.
Second, I think that one neocortex “thought” is made up of a large number of different components, and all of them carry separate reward predictions, which are combined (somehow) to get the attractiveness of the overall thought. Like, when you decide to step outside, you might expect to feel cold and sore muscles and wind and you’ll say goodbye to the people inside … all those different components could have different attractiveness. And an RPE changes the reward predictions of all of the ingredients of the thought, I think.
So like, if you’re very hungry but have no food, you can say to yourself “I’m going to open my cupboard and find that food has magically appeared”, and it seems like that should be a positive-RPE thought. But actually, the thought doesn’t carry a positive reward. The “I will find food” part by itself does, but meanwhile you’re also activating the thought “I am fooling myself”, and the previous 10 times that thought was active, it carried a negative RPE, so that thought carries a very negative RP whenever it’s invoked. But you can’t get rid of that thought, because it previously made correct sensory predictions in this kind of situation—that’s the previous paragraph.
Imagining eating beans to decide how rewarding they would be doesn’t seem to get any harder if I already know I don’t have any beans. And it doesn’t feel like “thoughts of eating beans” are reinforced, it feels like I gain abstract knowledge that eating beans would be rewarded.
I would posit that it’s a subtle effect in this particular example, because you don’t actually care that much about beans. I would say “You get a subtle positive reward for entertaining the idea of eating beans, and then if you realize that you’re out of beans and put the idea aside, you get a subtle negative reward upon going back to baseline.” I think if you come up with less subtle examples it might be easier to think about, perhaps.
My general feeling is that if you just abstractly think about something for no reason in particular, it activates the models weakly (and ditto if you hear that someone else is thinking about that thing, or remember that thing in the past, etc.) If you start to think of it as “something that will happen to me”, that activates the models more strongly. If you are directly experiencing the thing right now, it activates the model most strongly of all. I acknowledge that this is vague and unjustified, I wrote this but it’s all pretty half-baked.
An additional complication is that, as above, one thought consists of a bunch of component sub-thoughts, which all impact the reward prediction. If you imagine eating beans knowing that you’re not actually going to, the “knowing that I’m not actually going to” part of the thought can have its own reward prediction, I suppose.
Oh, yet another thing is that I think maybe we have no subjective awareness of “reward”, just RPE. (Reward does not feel rewarding!) So if we (1) decide “I will imagine yummy food”, then (2) imagine yummy food, then (3) stop imagining yummy food, we get a positive reward from the second step and a negative reward from the third step, but both of those rewards were already predicted by the first step, so there’s no RPE in either the second or third step, and therefore they don’t feel positive or negative. Unless we’re hungrier than we thought, I guess...
it seems like there’s a floor on the effect size, where arbitrarily low probability eventually stops weakening the effect
Yeah sure, if a model is active at all, it’s active above some threshold, I think. Like, if the neuron fires once every 10 minutes, then, well, the model is not actually turned on and affecting the brain. This is probably related to our inability to deal with small probabilities.
Meanwhile, it’s quite possible to trigger physiological responses by imagining things.
Yes, I would say the “neocortex is imagining / expecting to taste salt” signal has many downstream effects, one of which is affecting the reward signal, one of which is causing salivation.
This doesn’t seem like it stops working if you keep doing it
Really? I think that if some thought causes you to salivate, but doesn’t actually ever lead to eating for hours afterwards, and this happens over and over again for weeks, your systems would learn to stop salivating. I guess I don’t know for sure. Didn’t Pavlov do that experiment? See also my “scary movie” example here.
the rat starts salivating and feels something in its stomach that it previously learned means “my body wants the food” and concludes eating salt would be a good idea
Basically, there could be a non-reward signal that indicates “whatever you’re thinking of, eat it and you’ll feel rewarded”. And that could be learned from eating other food over the course of life. Yeah, sure, that could work. I think it would sorta amount to the same thing, because the neocortex would just turn that signal into a reward prediction, and register a positive RPE when it sees it. So why not just cut out the middleman and create a positive RPE by sending a reward? I guess you would argue that if it’s not at all rewarding to imagine food that you know you’re not going to eat, your theory fits that better.
Still thinking about it.
Thanks again, you’re being very helpful :-)
- ADifferentAnonymous 25 Nov 2020 0:24 UTC
  3 points
  0
  Parent
  Glad to hear this is helpful for you too :)
  I didn’t really follow the time-derivative idea before, and since you said it was equivalent I didn’t worry about it :p. But either it’s not really equivalent or I misunderstood the previous formulation, because I think everything works for me now.
  So if we (1) decide “I will imagine yummy food”, then (2) imagine yummy food, then (3) stop imagining yummy food, we get a positive reward from the second step and a negative reward from the third step, but both of those rewards were already predicted by the first step, so there’s no RPE in either the second or third step, and therefore they don’t feel positive or negative. Unless we’re hungrier than we thought, I guess...
  Well, what exactly happens if we’re hungrier than we thought?
  (1) “I will imagine food”: No reward yet, expecting moderate positive reward followed by moderate negative reward.
  (2) [Imagining food]: Large positive reward, but now expecting large negative reward when we stop imagining, so no RPE on previous step.
  (3) [Stops imagining food]: Large negative reward as expected, no RPE for previous step.
  The size of the reward can then be informative, but not actually rewarding (since it predictably nets to zero over time). The neocortex obtains hypothetical reward information form the subcortex, without actually extracting a reward—which is the thing I’ve been insisting had to be possible. Turns out we don’t need to use a separate channel! And the subcortex doesn’t have to know or care whether its receiving a genuine prediction or an exploratory imagining from the neocortex—the incentives are right either way.
  (We do still need some explanation of why the neocortex can imagine (predict?) food momentarily but can’t keep doing it food forever, avoid step (3), and pocket a positive RPE after step (2). Common sense suggests one: keeping such a thing up is effortful, so you’d be paying ongoing costs for a one-time gain, and unless you can keep it up forever the reward still nets to zero in the end)