A synapse is roughly equivalent to a parameter (say, within an order of magnitude) in terms of how much information can be stored or how much information it takes to specify synaptic strength..
There are trillions of synapses in a human brain and only billions of total base pairs, even before narrowing to the part of the genome that affects brain development. And the genome needs to specify both the brain architecture as well as innate reflexes/biases like the hot-stove reflex or (alleged) universal grammar.
Humans also spend a lot of time learning and have long childhoods, after which they have tons of knowledge that (I assert) could never have been crammed into a few dozen or hundred megabytes.
So I think something like 99.9% of what humans “know” (in the sense of their synaptic strengths) is learned during their lives, from their experiences.
This makes them basically disanalogous to neural nets.
Neural net (LLM):
Extremely concise architecture (kB’s of code) contains inductive biases
Lots of pretraining (billions of tokens or optimizer steps) produces 100s of billions of parameters of pretrained knowledge e.g. Lincoln
Smaller fine-tuning stage produces more specific behavior e.g. chatgpt’s distinctive “personality”, stored in the same parameters
Tiny amount of in-context learning (hundreds or thousands of tokens) involves things like induction heads and lets the model incorporate information from anywhere in the prompt in its response
Humans:
Enormous amount of evolution (thousands to millions of lifetimes?) produces a relatively small genome (millions of base pairs, or maybe a billion)
Much shorter amount of experience in childhood (and later) produces many trillions of synapses’ worth of knowledge and learned skills
Short term memory, phonological loop, etc lets humans make use of temporary information from the recent environment
You’re analogizing pretraining to evolution, which seems wrong to me (99.9% of human synaptic information comes from our own experiences); I’d say it’s closer to inductive bias from the architecture, but neural nets don’t have a bottleneck analogous to the genome.
In-context learning seems even more disanalogous to a human lifetime of experiences, because the pretrained weights of a neural net massively dwarf the context window or residual stream in terms of information content, which seems closer to the situation with total human synaptic strengths vs short-term memory (rather than genome vs learned synaptic strengths).
I would be more willing to analogize human experiences/childhood/etc to fine tuning, but I think the situation is just pretty different with regards to relative orders of magnitude, because of the gene bottleneck.
Was eventually convinced in most of your points, and added a long mistakes-list in the end of the post. I would really appreciate comments on the list, as I don’t feel fully converged on the subject yet
I think we have much more disagreement about psychology than about AI, though I admit to low certainty about the psychology too.
About AI, my point was that in understand the problem, the training loop take roughly the role of evolution and the model take that off the evolved agent—with implications to comparison of success, and possibly to identifying what’s missing. I did refer to the fact that algorithmically we took ideas from the human brain to the training loop, and it therefore make sense for it to be algorithmically more analogous to the brain. Given that clarification—do you still mostly disagree? (If not—how do you recommend to change the post and make it clearer?)
Adding “short term memory” to the picture is interesting, but then it’s there any mechanism for it to become long-term?
About the psychology: I do find the genetic bottleneck argument intuitively convincing, but think that we have reasons to distrust this intuition. There is often huge disparity between data in its most condensed form, and data in a form that is convenient to use in deployment. Think about the difference in length between a code written in functional/declarative language, and it’s assembly code. I have literally no intuition as to what can be done with 10 megabytes of condensed python—but I guess that it is more than enough to automate a human, if you know what code to write. While there probably is a lot of redundancy in the genome, it seem as least likely that there is huge redundancy of synapses, as their use is not just to store information, but mostly to fit the needed information manipulations.
yeah.
evolution = grad student descent, automl, etc
dna = training code
epigenetics = hyperparams
gestation = weight init, with a lot of built-in preset weights, plus a huge mostly-tiled neocortex
Developmental processes = training loop
I think this is partly true but mostly wrong.
A synapse is roughly equivalent to a parameter (say, within an order of magnitude) in terms of how much information can be stored or how much information it takes to specify synaptic strength..
There are trillions of synapses in a human brain and only billions of total base pairs, even before narrowing to the part of the genome that affects brain development. And the genome needs to specify both the brain architecture as well as innate reflexes/biases like the hot-stove reflex or (alleged) universal grammar.
Humans also spend a lot of time learning and have long childhoods, after which they have tons of knowledge that (I assert) could never have been crammed into a few dozen or hundred megabytes.
So I think something like 99.9% of what humans “know” (in the sense of their synaptic strengths) is learned during their lives, from their experiences.
This makes them basically disanalogous to neural nets.
Neural net (LLM):
Extremely concise architecture (kB’s of code) contains inductive biases
Lots of pretraining (billions of tokens or optimizer steps) produces 100s of billions of parameters of pretrained knowledge e.g. Lincoln
Smaller fine-tuning stage produces more specific behavior e.g. chatgpt’s distinctive “personality”, stored in the same parameters
Tiny amount of in-context learning (hundreds or thousands of tokens) involves things like induction heads and lets the model incorporate information from anywhere in the prompt in its response
Humans:
Enormous amount of evolution (thousands to millions of lifetimes?) produces a relatively small genome (millions of base pairs, or maybe a billion)
Much shorter amount of experience in childhood (and later) produces many trillions of synapses’ worth of knowledge and learned skills
Short term memory, phonological loop, etc lets humans make use of temporary information from the recent environment
You’re analogizing pretraining to evolution, which seems wrong to me (99.9% of human synaptic information comes from our own experiences); I’d say it’s closer to inductive bias from the architecture, but neural nets don’t have a bottleneck analogous to the genome.
In-context learning seems even more disanalogous to a human lifetime of experiences, because the pretrained weights of a neural net massively dwarf the context window or residual stream in terms of information content, which seems closer to the situation with total human synaptic strengths vs short-term memory (rather than genome vs learned synaptic strengths).
I would be more willing to analogize human experiences/childhood/etc to fine tuning, but I think the situation is just pretty different with regards to relative orders of magnitude, because of the gene bottleneck.
Was eventually convinced in most of your points, and added a long mistakes-list in the end of the post. I would really appreciate comments on the list, as I don’t feel fully converged on the subject yet
I think we have much more disagreement about psychology than about AI, though I admit to low certainty about the psychology too.
About AI, my point was that in understand the problem, the training loop take roughly the role of evolution and the model take that off the evolved agent—with implications to comparison of success, and possibly to identifying what’s missing. I did refer to the fact that algorithmically we took ideas from the human brain to the training loop, and it therefore make sense for it to be algorithmically more analogous to the brain. Given that clarification—do you still mostly disagree? (If not—how do you recommend to change the post and make it clearer?)
Adding “short term memory” to the picture is interesting, but then it’s there any mechanism for it to become long-term?
About the psychology: I do find the genetic bottleneck argument intuitively convincing, but think that we have reasons to distrust this intuition. There is often huge disparity between data in its most condensed form, and data in a form that is convenient to use in deployment. Think about the difference in length between a code written in functional/declarative language, and it’s assembly code. I have literally no intuition as to what can be done with 10 megabytes of condensed python—but I guess that it is more than enough to automate a human, if you know what code to write. While there probably is a lot of redundancy in the genome, it seem as least likely that there is huge redundancy of synapses, as their use is not just to store information, but mostly to fit the needed information manipulations.
yeah. evolution = grad student descent, automl, etc dna = training code epigenetics = hyperparams gestation = weight init, with a lot of built-in preset weights, plus a huge mostly-tiled neocortex Developmental processes = training loop