Where does “division of labor” fit in (e.g. Adam Smith’s discussion of the pin factory)? Like, 10^5 cells designed to pump blood + 10^5 cells designed to digest food would do better than 2×10^5 cells designed to simultaneously pump blood and digest food, right?
I guess I’m more generally concerned that you’re not distinguishing specialization from modularity. Specialization is about function, modularity is about connectivity, it seems to me. Organs are an example of specialization, not modularity, I think. Do neural networks have specialization? Yes, e.g. Chris Olah et al.’s curve detectors are akin to having cells specialized for pumping blood, I figure.
Organs are also “modular” but I think only for logistical reasons: the cells that digest food should all be close together, so that we can put the undigested food near them; The cells that process information should all be close together, because communication is slow; Muscle cells need to be close together and lined up, so that their forces can add up. Etc. Basically, there’s no teleportation in the real world. If there was teleportation of materials and forces and information etc. within an organism’s body, I wouldn’t necessarily expect “organs” in the normal sense. Instead maybe the “heart cells” and “digestion cells” and neurons etc. would all be scattered randomly about the body. And there would be a lot more connections between different parts. It would be better to have a direct teleportation link between the lungs and every cell of the body (to transport oxygen), and likewise it would be better to have a direct teleportation link between the digestive tract and every cell of the body (to transport sugar), etc., right? And if you did, a diagram of connectivity would make it look not very modular.
Neural nets on a computer do have teleportation, to some extent. For example, in a fully-connected layer, there’s no penalty for having every upstream neuron connect to every downstream neuron.
Incidentally, it’s not obvious to me that current ML algorithms are always significantly less modular than the human brain. Granted, the human brain has cortex, striatum, cerebellum, habenula, etc. But likewise a model-based RL algorithm could have an actor network, a critic network, a world-model network, human-written source code for the reward function, human-written source code for tree search, etc.
Regarding specialisation vs. modularity: I view the two as intimately connected, though not synonymous.
What makes a blood pumping cell a blood pumping cell and not a digestive cell? The way you connect the atoms, ultimately. So how come, when you want both blood pumping and digestion, it apparently works better to connect the atoms such that there’s a cluster that does blood pumping, and a basically separate cluster that does digesting, instead of a big digesting+blood pumping cluster?
You seem to take it as a given that it is so, but that’s exactly the kind of thing we’re setting out to explain!
“Because physics is local” is one reason, clearly. But if it were the only reason, designs without locality on our computers would never be modular. But they are, sometimes!
R.e: logistical reasons: Yes, that’s local connection cost due to locality again, basically.
We have not quantitatively scored modularity in current ML compared to modularity in the human brain (taking each neuron as a node) yet. It would indeed be interesting to see how that comes out. We have the Q score of some CNNs, thanks to CHAI. Do you know of any paper trying to calculate Q for the brain?
when you want both blood pumping and digestion, it apparently works better to connect the atoms such that there’s a cluster that does blood pumping, and a basically separate cluster that does digesting, instead of a big digesting+blood pumping cluster?
Are you asking: why doesn’t each individual cell in the cluster do both digestion and blood pumping? Or are you asking: why aren’t the digestion cells and blood-pumping cells intermingled? I suspect that those two questions have different answers (“specialization” and “logistics/locality” respectively).
Do you know of any paper trying to calculate Q for the brain?
Beats me. I had been assuming that you were thinking of gross anatomy (“the cerebellum is over here, and the cortex is over there, and they look different and they do different things etc.”), by analogy with the liver and heart etc.
I am asking both! I suspect the reasons are likely to be very similar, in the sense that you can find a set of general modularity theorems that will predict you both phenomena.
Why does specialisation work better? It clearly doesn’t always. A lot of early NN designs love to mash everything together into an interconnected mess, and the result performs better on the loss function than modular designs that have parts specialised for each subtask.
Are connection costs due to locality a bigger deal for inter-cell dynamics than intra-cell dynamics? I’d guess yes. I am not a biologist, but it sure seems like interacting with things in the same cell as you should be relatively easier, though still non-trivial.
Are connection costs in the inter-cell regime so harsh that they completely dominate all other modularity selection effects in that regime, so we don’t need to care about them? I’m not so sure. I suspect not.
Beats me. I had been assuming that you were thinking of gross anatomy (“the cerebellum is over here, and the cortex is over there, and they look different and they do different things etc.”), by analogy with the liver and heart etc.
I’m thinking about those too. The end goal here is literally a comprehensive model for when modularity happens and how much, for basically anything at any scale that was made by an optimisation process like genetic algorithms, gradient descent, ADAM, or whatever.
There’s another issue that one individual cell, at any one time, can only have one membrane potential, and can only have one concentration of calcium ions in its cytoplasm, and can only have one concentration of any given protein in its cytoplasm, and can only have one areal density of any given integral membrane protein on its outer membrane, and so on. (If I understand correctly.) If the optimal values of any of these parameters (say, potassium ion concentration in the cytoplasm) is different for muscle-type-functionality versus neuron-type-functionality versus immune-system-type-functionality etc. (which seems extremely likely to me), then there would be an obvious benefit to splitting those functions into different cells, each with its own separate cytoplasm and so on.
But why is that so? Why are there no parameter combinations here that let you do well simultaneously on all of these tasks, unless you split your system into parts? That is what we are asking.
Could it be that such optima just do not exist? Maybe. It’s certainly not how it seems to work out in small neural networks, but perhaps for some classes of tasks, they really don’t, or are infrequent enough to not be finable by the optimiser. That’s the direct selection for modularity hypothesis, basically.
I don’t currently favour that one though. The tendency of our optimisers to connect everything to everything else even when it seems to us like it should be actively counterproductive to do so, but still end up with a good loss, suggests to me that our intuition that you can’t do good while trying to do everything at once is mistaken. At least as long as “doing good” is defined as scoring well on the loss function. If you add in things like robustness, you might have a very different story. Thus, MVG.
For example, gastric parietal cells excrete hydrochloric acid into the stomach. The cell is a little chemical factory. For the chemistry steps to work properly, the potassium ion concentration in the cytoplasm needs to be maintained at a certain level.
Meanwhile, neurons communicate information by firing action potentials. These involve the potassium concentration in the cytoplasm dropping rapidly at certain times, before slowly returning to a higher level.
So, now suppose the same cell is tasked with both excreting hydrochloric acid and communicating information by firing action potentials. I claim that it wouldn’t be able to do both well. Either the potassium concentration in the cytoplasm is maintained at the level which best facilitates the acid-producing chemical reactions, or else the potassium level in the cytoplasm is periodically dropping rapidly and climbing back up during action potentials. It can’t be both. Right?
(I’m not an expert on cell biology and could be messing up some details here, but I feel confident that there are lots and lots of true stories in this general category.)
It e.g. wouldn’t use potassium to send signals, I’d imagine. If a design like this exists, I’d expect it to involve totally different parts and steps that do not conflict like this. Something like a novel (to us) kind of ion channel, maybe, or something even stranger.
Does it seem to you that the constraints put on cell design are such that the ways of sending signals and digesting things we currently know of are the only ones that seem physically possible?
This is not a rhetorical question. My knowledge of cell biology is severely lacking, so I don’t have deep intuitions telling me which things seem uniquely nailed down by the laws of physics. I just had a look at the action potential wikipedia page, and didn’t immediately see why using potassium ions was the only thing evolution could’ve possibly done to make a signalling thing. Or why using hydrochloric acid would be the only way to do digestion.
You’re a chemical engineer. Your task is to manufacture chemicals A and B. But you only get one well-mixed tank: all the precursors of chemical A have to mix with each other and with the precursors of chemical B without undesired cross-reactions, and the A and B production process need to take place in the same salinity environment and the same pH environment, etc.
Maybe that’s doable.
Oh hey, now also please manufacture chemicals C,D,E,F,G,H,I,J and K. You still have one well-mixed tank in which to do all those things simultaneously.
Maybe that’s doable too. Or maybe not. And if it is doable at all, presumably the efficiency would be terrible.
At some point, you’re going to throw up your hands and go to your boss and say “This is insane, we need more than one well-mixed tank to get all this stuff done. At least let me have a high-pH tank for manufacturing A,B,E,G,H and a low-pH tank for manufacturing C,D,F,I,J,K. Pretty please, boss?”
Anyway, you can say “Design space is huge, I bet there’s a way to manufacture all these chemicals in the same well-mixed tank.” Maybe that’s true, I don’t know. But my response is: there is a best possible way to manufacture A, and there is a best possible way to manufacture B, etc., and these do not involve the exact same salinity, pH, etc. Therefore, when we put everything into one tank, that’s only possible by making tradeoffs. A chemical plant that has multiple tanks will be way better.
Back to biology. (I’m not an expert on cell biology either, be warned.)
A cell cytoplasm is a well-mixed chemical reactor tank, as far as I understand. It’s not compartmentalized (with certain exceptions).
My strong impression is that there are no one-size-fits-all jack-of-all-trades cells. Not in multicellular life, and not in single-cell life either. Different single-cell species specialize in performing different chemical reactions.
Thus, we can have single-cell organism Q that eats food X and spits out waste product Y, and then a different single-cell organism P eats Y and spits out waste product Z, etc. We can kibbitz from the sideline that Q is being “wasteful”: why spit out Y as waste, when in fact Y was digestible in principle—after all, P just digested it! But from my perspective this is pretty much expected. Maybe digesting Y and digesting X require very different salinity or pH, for example. So in order to digest Y at all, the cell Q would need to be much worse at digesting X, and that tradeoff winds up being not worthwhile.
didn’t immediately see why using potassium ions was the only thing evolution could’ve possibly done to make a signalling thing. Or why using hydrochloric acid would be the only way to do digestion.
It could be chloride ions instead of potassium, as far as I know. But there are only so many elements on the periodic table, and many fewer that are abundant in the evironment, and they all have different properties which may be disadvantageous for a particular function. There are other ways to do digestion, but they don’t all spend identical resources and produce identical results, right? So there are tradeoffs.
Where does “division of labor” fit in (e.g. Adam Smith’s discussion of the pin factory)? Like, 10^5 cells designed to pump blood + 10^5 cells designed to digest food would do better than 2×10^5 cells designed to simultaneously pump blood and digest food, right?
I guess I’m more generally concerned that you’re not distinguishing specialization from modularity. Specialization is about function, modularity is about connectivity, it seems to me. Organs are an example of specialization, not modularity, I think. Do neural networks have specialization? Yes, e.g. Chris Olah et al.’s curve detectors are akin to having cells specialized for pumping blood, I figure.
Organs are also “modular” but I think only for logistical reasons: the cells that digest food should all be close together, so that we can put the undigested food near them; The cells that process information should all be close together, because communication is slow; Muscle cells need to be close together and lined up, so that their forces can add up. Etc. Basically, there’s no teleportation in the real world. If there was teleportation of materials and forces and information etc. within an organism’s body, I wouldn’t necessarily expect “organs” in the normal sense. Instead maybe the “heart cells” and “digestion cells” and neurons etc. would all be scattered randomly about the body. And there would be a lot more connections between different parts. It would be better to have a direct teleportation link between the lungs and every cell of the body (to transport oxygen), and likewise it would be better to have a direct teleportation link between the digestive tract and every cell of the body (to transport sugar), etc., right? And if you did, a diagram of connectivity would make it look not very modular.
Neural nets on a computer do have teleportation, to some extent. For example, in a fully-connected layer, there’s no penalty for having every upstream neuron connect to every downstream neuron.
Incidentally, it’s not obvious to me that current ML algorithms are always significantly less modular than the human brain. Granted, the human brain has cortex, striatum, cerebellum, habenula, etc. But likewise a model-based RL algorithm could have an actor network, a critic network, a world-model network, human-written source code for the reward function, human-written source code for tree search, etc.
Regarding specialisation vs. modularity: I view the two as intimately connected, though not synonymous.
What makes a blood pumping cell a blood pumping cell and not a digestive cell? The way you connect the atoms, ultimately. So how come, when you want both blood pumping and digestion, it apparently works better to connect the atoms such that there’s a cluster that does blood pumping, and a basically separate cluster that does digesting, instead of a big digesting+blood pumping cluster?
You seem to take it as a given that it is so, but that’s exactly the kind of thing we’re setting out to explain!
“Because physics is local” is one reason, clearly. But if it were the only reason, designs without locality on our computers would never be modular. But they are, sometimes!
R.e: logistical reasons: Yes, that’s local connection cost due to locality again, basically.
We have not quantitatively scored modularity in current ML compared to modularity in the human brain (taking each neuron as a node) yet. It would indeed be interesting to see how that comes out. We have the Q score of some CNNs, thanks to CHAI. Do you know of any paper trying to calculate Q for the brain?
Are you asking: why doesn’t each individual cell in the cluster do both digestion and blood pumping? Or are you asking: why aren’t the digestion cells and blood-pumping cells intermingled? I suspect that those two questions have different answers (“specialization” and “logistics/locality” respectively).
Beats me. I had been assuming that you were thinking of gross anatomy (“the cerebellum is over here, and the cortex is over there, and they look different and they do different things etc.”), by analogy with the liver and heart etc.
I am asking both! I suspect the reasons are likely to be very similar, in the sense that you can find a set of general modularity theorems that will predict you both phenomena.
Why does specialisation work better? It clearly doesn’t always. A lot of early NN designs love to mash everything together into an interconnected mess, and the result performs better on the loss function than modular designs that have parts specialised for each subtask.
Are connection costs due to locality a bigger deal for inter-cell dynamics than intra-cell dynamics? I’d guess yes. I am not a biologist, but it sure seems like interacting with things in the same cell as you should be relatively easier, though still non-trivial.
Are connection costs in the inter-cell regime so harsh that they completely dominate all other modularity selection effects in that regime, so we don’t need to care about them? I’m not so sure. I suspect not.
I’m thinking about those too. The end goal here is literally a comprehensive model for when modularity happens and how much, for basically anything at any scale that was made by an optimisation process like genetic algorithms, gradient descent, ADAM, or whatever.
There’s another issue that one individual cell, at any one time, can only have one membrane potential, and can only have one concentration of calcium ions in its cytoplasm, and can only have one concentration of any given protein in its cytoplasm, and can only have one areal density of any given integral membrane protein on its outer membrane, and so on. (If I understand correctly.) If the optimal values of any of these parameters (say, potassium ion concentration in the cytoplasm) is different for muscle-type-functionality versus neuron-type-functionality versus immune-system-type-functionality etc. (which seems extremely likely to me), then there would be an obvious benefit to splitting those functions into different cells, each with its own separate cytoplasm and so on.
But why is that so? Why are there no parameter combinations here that let you do well simultaneously on all of these tasks, unless you split your system into parts? That is what we are asking.
Could it be that such optima just do not exist? Maybe. It’s certainly not how it seems to work out in small neural networks, but perhaps for some classes of tasks, they really don’t, or are infrequent enough to not be finable by the optimiser. That’s the direct selection for modularity hypothesis, basically.
I don’t currently favour that one though. The tendency of our optimisers to connect everything to everything else even when it seems to us like it should be actively counterproductive to do so, but still end up with a good loss, suggests to me that our intuition that you can’t do good while trying to do everything at once is mistaken. At least as long as “doing good” is defined as scoring well on the loss function. If you add in things like robustness, you might have a very different story. Thus, MVG.
For example, gastric parietal cells excrete hydrochloric acid into the stomach. The cell is a little chemical factory. For the chemistry steps to work properly, the potassium ion concentration in the cytoplasm needs to be maintained at a certain level.
Meanwhile, neurons communicate information by firing action potentials. These involve the potassium concentration in the cytoplasm dropping rapidly at certain times, before slowly returning to a higher level.
So, now suppose the same cell is tasked with both excreting hydrochloric acid and communicating information by firing action potentials. I claim that it wouldn’t be able to do both well. Either the potassium concentration in the cytoplasm is maintained at the level which best facilitates the acid-producing chemical reactions, or else the potassium level in the cytoplasm is periodically dropping rapidly and climbing back up during action potentials. It can’t be both. Right?
(I’m not an expert on cell biology and could be messing up some details here, but I feel confident that there are lots and lots of true stories in this general category.)
It e.g. wouldn’t use potassium to send signals, I’d imagine. If a design like this exists, I’d expect it to involve totally different parts and steps that do not conflict like this. Something like a novel (to us) kind of ion channel, maybe, or something even stranger.
Does it seem to you that the constraints put on cell design are such that the ways of sending signals and digesting things we currently know of are the only ones that seem physically possible?
This is not a rhetorical question. My knowledge of cell biology is severely lacking, so I don’t have deep intuitions telling me which things seem uniquely nailed down by the laws of physics. I just had a look at the action potential wikipedia page, and didn’t immediately see why using potassium ions was the only thing evolution could’ve possibly done to make a signalling thing. Or why using hydrochloric acid would be the only way to do digestion.
You’re a chemical engineer. Your task is to manufacture chemicals A and B. But you only get one well-mixed tank: all the precursors of chemical A have to mix with each other and with the precursors of chemical B without undesired cross-reactions, and the A and B production process need to take place in the same salinity environment and the same pH environment, etc.
Maybe that’s doable.
Oh hey, now also please manufacture chemicals C,D,E,F,G,H,I,J and K. You still have one well-mixed tank in which to do all those things simultaneously.
Maybe that’s doable too. Or maybe not. And if it is doable at all, presumably the efficiency would be terrible.
At some point, you’re going to throw up your hands and go to your boss and say “This is insane, we need more than one well-mixed tank to get all this stuff done. At least let me have a high-pH tank for manufacturing A,B,E,G,H and a low-pH tank for manufacturing C,D,F,I,J,K. Pretty please, boss?”
Anyway, you can say “Design space is huge, I bet there’s a way to manufacture all these chemicals in the same well-mixed tank.” Maybe that’s true, I don’t know. But my response is: there is a best possible way to manufacture A, and there is a best possible way to manufacture B, etc., and these do not involve the exact same salinity, pH, etc. Therefore, when we put everything into one tank, that’s only possible by making tradeoffs. A chemical plant that has multiple tanks will be way better.
Back to biology. (I’m not an expert on cell biology either, be warned.)
A cell cytoplasm is a well-mixed chemical reactor tank, as far as I understand. It’s not compartmentalized (with certain exceptions).
My strong impression is that there are no one-size-fits-all jack-of-all-trades cells. Not in multicellular life, and not in single-cell life either. Different single-cell species specialize in performing different chemical reactions.
Thus, we can have single-cell organism Q that eats food X and spits out waste product Y, and then a different single-cell organism P eats Y and spits out waste product Z, etc. We can kibbitz from the sideline that Q is being “wasteful”: why spit out Y as waste, when in fact Y was digestible in principle—after all, P just digested it! But from my perspective this is pretty much expected. Maybe digesting Y and digesting X require very different salinity or pH, for example. So in order to digest Y at all, the cell Q would need to be much worse at digesting X, and that tradeoff winds up being not worthwhile.
It could be chloride ions instead of potassium, as far as I know. But there are only so many elements on the periodic table, and many fewer that are abundant in the evironment, and they all have different properties which may be disadvantageous for a particular function. There are other ways to do digestion, but they don’t all spend identical resources and produce identical results, right? So there are tradeoffs.