See e.g. papers finding that you can use a linear function to translate some concepts between brain scans and internal layers in a LLM, or the extremely close correspondence between ConvNet feature and neurons in the visual cortex.
I would love links to these if you have time.
But also, let’s says it’s true that there is similarity in internal structure of the end results—adult human brain and trained LLM. Adult human brain was produced by evolution + learning after birth. Trained LLM was produced by gradient descent. This does not tell me evolution doesn’t matter and learning after birth matters.
> But most of the human brain (the neocortex) already learns its ‘weights’ from experience over a human lifetime, in a way that’s not all that different from self-supervised learning if you squint.
The difference is that the weights are not initialised with random values at birth (or at the embryo stage, to be more precise).
They only apply in a weaker sense where you are aware you’re working with analogy, and should hopefully be tracking some more detailed model behind the scenes.
What do you mean by weaker sense? I say irrelevant and you say weaker sense, so we’re not yet in agreement then. How much predictive power does this analogy have as per you personally?
The difference is that the weights are not initialised with random values at birth (or at the embryo stage, to be more precise).
The human cortex (the part we have way more of than chimps) is initialized to be made of a bunch of cortical column units, with slowly varying properties over the surface of the brain. But there’s decent evidence that there’s not much more initialization than that, and that that huge fraction of the brain has to slowly pick up knowledge within the human lifetime before it starts being useful, e.g. https://pmc.ncbi.nlm.nih.gov/articles/PMC9957955/
Or you could think about it like our DNA has on the order of a megabyte to spend on the brain, and the adult brain has on the order of a terabyte of information. So 99.99[..]% of the information in the adult brain comes from the learning algorithm, not the initialization.
How much predictive power does this analogy have as per you personally?
Yeah, it’s way more informative than the evolution analogy to me, because I expect human researchers + computers spending resources designing AI to be pretty hard to analogize to evolution, but learning within AI to be within a few orders of magnitude on various resources to learning within a brain’s lifetime.
Thanks for the links. Might go through when I find time.
Even if the papers prove that there’s similiarities, I don’t see how this proves anything about evolution versus within-lifetime learning.
But there’s decent evidence that there’s not much more initialization than that, and that that huge fraction of the brain has to slowly pick up knowledge within the human lifetime before it starts being useful, e.g. https://pmc.ncbi.nlm.nih.gov/articles/PMC9957955/
This seems like your strongest argument. I will have to study more to understand this.
our DNA has on the order of a megabyte to spend on the brain
That’s it? Really? That is new information for me.
Tbh your argument might end up being persuasive to me. So thank you for writing them.
The problem is that me building a background in neuroscience to the point I’m confident I’m not being fooled, will take time. And I’m interested in neuroscience but not that interested in studying it just for AI safety reasons. If you have like a post that covers this argument well (around initialisation not storing a lot of information) it’ll be nice. (But not necessary ofcourse, that’s upto you)
I would love links to these if you have time.
But also, let’s says it’s true that there is similarity in internal structure of the end results—adult human brain and trained LLM. Adult human brain was produced by evolution + learning after birth. Trained LLM was produced by gradient descent. This does not tell me evolution doesn’t matter and learning after birth matters.
> But most of the human brain (the neocortex) already learns its ‘weights’ from experience over a human lifetime, in a way that’s not all that different from self-supervised learning if you squint.
The difference is that the weights are not initialised with random values at birth (or at the embryo stage, to be more precise).
What do you mean by weaker sense? I say irrelevant and you say weaker sense, so we’re not yet in agreement then. How much predictive power does this analogy have as per you personally?
Some survey articles:
https://arxiv.org/abs/2306.05126
https://arxiv.org/pdf/2001.07092
The human cortex (the part we have way more of than chimps) is initialized to be made of a bunch of cortical column units, with slowly varying properties over the surface of the brain. But there’s decent evidence that there’s not much more initialization than that, and that that huge fraction of the brain has to slowly pick up knowledge within the human lifetime before it starts being useful, e.g. https://pmc.ncbi.nlm.nih.gov/articles/PMC9957955/
Or you could think about it like our DNA has on the order of a megabyte to spend on the brain, and the adult brain has on the order of a terabyte of information. So 99.99[..]% of the information in the adult brain comes from the learning algorithm, not the initialization.
Yeah, it’s way more informative than the evolution analogy to me, because I expect human researchers + computers spending resources designing AI to be pretty hard to analogize to evolution, but learning within AI to be within a few orders of magnitude on various resources to learning within a brain’s lifetime.
Thanks for the links. Might go through when I find time.
Even if the papers prove that there’s similiarities, I don’t see how this proves anything about evolution versus within-lifetime learning.
This seems like your strongest argument. I will have to study more to understand this.
That’s it? Really? That is new information for me.
Tbh your argument might end up being persuasive to me. So thank you for writing them.
The problem is that me building a background in neuroscience to the point I’m confident I’m not being fooled, will take time. And I’m interested in neuroscience but not that interested in studying it just for AI safety reasons. If you have like a post that covers this argument well (around initialisation not storing a lot of information) it’ll be nice. (But not necessary ofcourse, that’s upto you)