a comparatively elaborate world-modeling infrastructure is already in place, having been hardcoded by the genome
is an obvious model, given that most of the brain is NOT necortex, but much more ancient structures. Somewhere inside there is an input to the nervous system, SALT_CONTENT_IN_BLOOD which gets translated into less graded and more binary “salt taste GOOD” or “salt taste BAD”, and the “Need SALT” on/off urge. When the rat tastes the salt water from the tap for the first time, what gets recorded is not (just) “tastes good” or “tastes bad” but “tastes SALTY”, which is post-processed into a behavior based on whether the salty taste is good or bad. Together with the urge to seek salt when low, and the memory of the salty taste from before, this would explain the rats’ behavior pretty well.
You don’t need a fancy neural net and reinforcement learning here, the logic seems quite basic.
most of the brain is NOT necortex, but much more ancient structures
Well, 75% of the human brain by weight is neocortex. Not sure what the ratio is for rats. I disagree with “much more ancient structures”; my current understanding is that the neocortex is not all that different from the pallium in birds and lizards, and there are homologous structures even in lampreys I think. Some say the neurons in a bird pallium are arranged differently in space from the neurons in a mammal neocortex, but the neurons are connected into the same circuits doing the same computations. I don’t really know, I think it’s still an open question.
Dayan & Berridge cite this old study with decorticate rats (rats with their neocortexes surgically removed (yikes)), I read it (well, parts of it) a few days ago when researching this post. Unfortunately they didn’t do the types of tests I’m talking about here, where they go for the salt based on memory alone (indeed, based on an unpleasant memory). I didn’t think anything in that paper contradicted what I wrote, unless I misunderstood something, which is entirely possible. It would be interesting if the study I described in this post was repeated with decorticate rats. My strong expectation would be that it wouldn’t work. It would be an even stronger expectation if both the neocortex and hippocampus were removed—as I mentioned, I count the hippocampus as part of the “neocortex subsystem”.
You don’t need a fancy neural net and reinforcement learning here
I think I get what you’re trying to say, but for what it’s worth, I think it’s well-established that mammals (and presumably many other animals) do a kind of reinforcement learning, famously involving dopamine neurons doing TD learning, or at least something related to TD learning. Of course animal brains do other things too: RL is not a grand all-encompassing theory of animal brains. But RL is one thing that they do. So putting reinforcement learning into the story is not an burdensome detail for which my model should be penalized—we already know that RL is present in the rat brain! Likewise, it’s an established fact that the amygdala does supervised learning, as far as I understand from sources like this.
Somewhere inside there is an input to the nervous system, SALT_CONTENT_IN_BLOOD which gets translated into less graded and more binary “salt taste GOOD” or “salt taste BAD”, and the “Need SALT” on/off urge. When the rat tastes the salt water from the tap for the first time, what gets recorded is not (just) “tastes good” or “tastes bad” but “tastes SALTY”, which is post-processed into a behavior based on whether the salty taste is good or bad. Together with the urge to seek salt when low, and the memory of the salty taste from before, this would explain the rats’ behavior pretty well.
I basically agree with that and would say that I am trying to flesh out the details. But let me say why I think it’s not such a simple computation.
Photons hit the retina. The brain has to turn these photons into a predictive model of the world. Rat-cage-levers were not present in the evolutionary environment, so the predictive model needs to have a flexible way to learn new concepts / things and relations between them, including causal relations, spatial relations, temporal relations, and so on. So there’s this big predictive world-model in the brain, and the genome has no idea what’s in it. It’s just a bunch of unlabeled items, and they only have semantic meaning through their web of connections. If Entity #5785238 is active, then it’s likely that Entity #6873298 is also active. Etc. etc.
Again, levers are an arbitrary, learned object in the world-model, not hardwired. The affordance of “pressing the saltwater lever” is Entity #123456 in the world-model, let’s say. The rat learns that Entity #123456 causes the taste of salt. How is that information learned, and stored, and how is it used to drive behavior in a salt-deprivation-dependent way? That’s the question.
Like, you say “urge to seek salt” is one of your ingredients. OK, sure, but an “urge” is an intuitive notion. What is an “urge” in terms of an algorithm? How do you flesh it out? I know how I would answer that question: I would flesh it out by introducing the idea of reinforcement learning. I would say that every entity in the world-model (well, the part of the world model that’s stored in the frontal lobe) carries a scalar reward prediction, and these numbers are updated by a reward signal, and you decide whether to do an action or think a thought based on whether it predicts more reward than what you would be doing otherwise. And now I can answer the question: what’s an urge? An “urge to do X” is when the ground-truth reward signal shifts to make “doing X” and “thinking about doing X” suddenly more rewarding than usual. I’m not giving all the details here, but I feel like I have a nice outline of a picture in my head here of “urge”, and it bridges all the way from neurons to algorithms to behavior. You’re saying that there’s an “urge”, but you’re also saying that you don’t need “fancy” reinforcement learning to implement it. OK, how then? What is the “urge” under the hood? That’s not rhetorical. If you have an answer, I’m very interested and would love to brainstorm with you. :-)
Well, you are clearly an expert here. And indeed bridging from the neurons to algorithms has been an open problem since forever. What I meant is, assuming you needed to code, say, an NPC in a game, you would code an “urge” certain way, probably in just a few dozen lines of code. Plus the underlying language, plus the compiler, plus the OS, plus the hardware, which is basically gates upon gates upon gates, all alike. There is no reinforcement learning there at all, and yet rows of nearly identical gates become an algorithm. Maybe some parts of the brain work like that, as well?
LOL, “fake it til you make it” ftw! I disagree, but that’s very kind of you to say. :-)
assuming you needed to code, say, an NPC in a game, you would code an “urge” certain way, probably in just a few dozen lines of code.
Hmm, if the NPC didn’t need to be very good, I would use bacteria-level logic like “if you’re getting attacked, move away from the last hit, if you can attack, attack, otherwise move towards the player” or whatever. Then an “urge to be more aggressive” would be, like, move towards the player more quickly, or changing some thresholds. But there’s no foresight / planning here. So that’s not exactly relevant to this post. The rats are making a plan to get saltwater.
So, if I were going to make the NPC better, maybe I would next incorporate planning. Like: Define an “expected reward” function (related to position, health, etc.), consider possible things to do next, and then pick the thing with the highest expected reward at the end. That might be more than a few dozen lines of code … I guess it depends on the library functions available :-P Then you could have an “urge” be expressed through tweaking the parameters of the expected-reward calculation. And now the NPC would then be able to make plans to satisfy the urge, as opposed to just acting based on what’s right in front of it, or whatever.
This thing where it considering multiple possible courses of action—that’s not yet model-based reinforcement learning. There’s no learning! But I would say it’s a first step. There is indeed something like that as an ingredient in AlphaZero, for example.
But that’s actually all that matters for this post anyway, I think. The real RL part−where you learn things—didn’t come up. Maybe I shouldn’t have brought up RL at all, now that I think about it :-P
is an obvious model, given that most of the brain is NOT necortex, but much more ancient structures. Somewhere inside there is an input to the nervous system, SALT_CONTENT_IN_BLOOD which gets translated into less graded and more binary “salt taste GOOD” or “salt taste BAD”, and the “Need SALT” on/off urge. When the rat tastes the salt water from the tap for the first time, what gets recorded is not (just) “tastes good” or “tastes bad” but “tastes SALTY”, which is post-processed into a behavior based on whether the salty taste is good or bad. Together with the urge to seek salt when low, and the memory of the salty taste from before, this would explain the rats’ behavior pretty well.
You don’t need a fancy neural net and reinforcement learning here, the logic seems quite basic.
Well, 75% of the human brain by weight is neocortex. Not sure what the ratio is for rats. I disagree with “much more ancient structures”; my current understanding is that the neocortex is not all that different from the pallium in birds and lizards, and there are homologous structures even in lampreys I think. Some say the neurons in a bird pallium are arranged differently in space from the neurons in a mammal neocortex, but the neurons are connected into the same circuits doing the same computations. I don’t really know, I think it’s still an open question.
Dayan & Berridge cite this old study with decorticate rats (rats with their neocortexes surgically removed (yikes)), I read it (well, parts of it) a few days ago when researching this post. Unfortunately they didn’t do the types of tests I’m talking about here, where they go for the salt based on memory alone (indeed, based on an unpleasant memory). I didn’t think anything in that paper contradicted what I wrote, unless I misunderstood something, which is entirely possible. It would be interesting if the study I described in this post was repeated with decorticate rats. My strong expectation would be that it wouldn’t work. It would be an even stronger expectation if both the neocortex and hippocampus were removed—as I mentioned, I count the hippocampus as part of the “neocortex subsystem”.
I think I get what you’re trying to say, but for what it’s worth, I think it’s well-established that mammals (and presumably many other animals) do a kind of reinforcement learning, famously involving dopamine neurons doing TD learning, or at least something related to TD learning. Of course animal brains do other things too: RL is not a grand all-encompassing theory of animal brains. But RL is one thing that they do. So putting reinforcement learning into the story is not an burdensome detail for which my model should be penalized—we already know that RL is present in the rat brain! Likewise, it’s an established fact that the amygdala does supervised learning, as far as I understand from sources like this.
I basically agree with that and would say that I am trying to flesh out the details. But let me say why I think it’s not such a simple computation.
Photons hit the retina. The brain has to turn these photons into a predictive model of the world. Rat-cage-levers were not present in the evolutionary environment, so the predictive model needs to have a flexible way to learn new concepts / things and relations between them, including causal relations, spatial relations, temporal relations, and so on. So there’s this big predictive world-model in the brain, and the genome has no idea what’s in it. It’s just a bunch of unlabeled items, and they only have semantic meaning through their web of connections. If Entity #5785238 is active, then it’s likely that Entity #6873298 is also active. Etc. etc.
Again, levers are an arbitrary, learned object in the world-model, not hardwired. The affordance of “pressing the saltwater lever” is Entity #123456 in the world-model, let’s say. The rat learns that Entity #123456 causes the taste of salt. How is that information learned, and stored, and how is it used to drive behavior in a salt-deprivation-dependent way? That’s the question.
Like, you say “urge to seek salt” is one of your ingredients. OK, sure, but an “urge” is an intuitive notion. What is an “urge” in terms of an algorithm? How do you flesh it out? I know how I would answer that question: I would flesh it out by introducing the idea of reinforcement learning. I would say that every entity in the world-model (well, the part of the world model that’s stored in the frontal lobe) carries a scalar reward prediction, and these numbers are updated by a reward signal, and you decide whether to do an action or think a thought based on whether it predicts more reward than what you would be doing otherwise. And now I can answer the question: what’s an urge? An “urge to do X” is when the ground-truth reward signal shifts to make “doing X” and “thinking about doing X” suddenly more rewarding than usual. I’m not giving all the details here, but I feel like I have a nice outline of a picture in my head here of “urge”, and it bridges all the way from neurons to algorithms to behavior. You’re saying that there’s an “urge”, but you’re also saying that you don’t need “fancy” reinforcement learning to implement it. OK, how then? What is the “urge” under the hood? That’s not rhetorical. If you have an answer, I’m very interested and would love to brainstorm with you. :-)
Well, you are clearly an expert here. And indeed bridging from the neurons to algorithms has been an open problem since forever. What I meant is, assuming you needed to code, say, an NPC in a game, you would code an “urge” certain way, probably in just a few dozen lines of code. Plus the underlying language, plus the compiler, plus the OS, plus the hardware, which is basically gates upon gates upon gates, all alike. There is no reinforcement learning there at all, and yet rows of nearly identical gates become an algorithm. Maybe some parts of the brain work like that, as well?
LOL, “fake it til you make it” ftw! I disagree, but that’s very kind of you to say. :-)
Hmm, if the NPC didn’t need to be very good, I would use bacteria-level logic like “if you’re getting attacked, move away from the last hit, if you can attack, attack, otherwise move towards the player” or whatever. Then an “urge to be more aggressive” would be, like, move towards the player more quickly, or changing some thresholds. But there’s no foresight / planning here. So that’s not exactly relevant to this post. The rats are making a plan to get saltwater.
So, if I were going to make the NPC better, maybe I would next incorporate planning. Like: Define an “expected reward” function (related to position, health, etc.), consider possible things to do next, and then pick the thing with the highest expected reward at the end. That might be more than a few dozen lines of code … I guess it depends on the library functions available :-P Then you could have an “urge” be expressed through tweaking the parameters of the expected-reward calculation. And now the NPC would then be able to make plans to satisfy the urge, as opposed to just acting based on what’s right in front of it, or whatever.
This thing where it considering multiple possible courses of action—that’s not yet model-based reinforcement learning. There’s no learning! But I would say it’s a first step. There is indeed something like that as an ingredient in AlphaZero, for example.
But that’s actually all that matters for this post anyway, I think. The real RL part−where you learn things—didn’t come up. Maybe I shouldn’t have brought up RL at all, now that I think about it :-P