So here’s one important difference between humans and neural networks: humans face the genomic bottleneck which means that each individual has to rederive all the knowledge about the world that their parents already had. If this genetic bottleneck hadn’t been so tight, then individual humans would have been significantly less capable of performing novel tasks.
I disagree with this in an interesting way. (Not particularly central to the discussion, but since both Richard & Eliezer thought the quoted claim is basically-true, I figured I should comment on it.)
First, outside view evidence: most of the genome is junk. That’s pretty strong evidence that the size of the genome is not itself a taut constraint. If there evolutionary fitness gains to be had, in general, by passing more information via the genome, then we should expect that to have evolved already.
Second, inside view: overparameterized local search processes (including evolution and gradient descent on NNs) perform information compression by default. This is a technical idea that I haven’t written up properly yet, but as a quick sketch… suppose that I have a neural net with N parameters. It’s overparameterized, so there are many degrees of freedom in any optimum—i.e. there’s a whole optimal surface, not just an optimal point. Now suppose that I can build a near-perfect model of the training data by setting only M (< N) parameter-values; with these values, all the other parameters are screened off, so the remaining N-M parameters can take any values at all. (I’ll call the set of M parameter-values a “model”.) The smaller M, the larger N-M, and therefore the more possible parameter-values achieve optimality using this model. And the more possible parameter-values achieve optimality using the model, the more of the optimum-space this “model” fills. In practice, for something like evolution or gradient descent, this would mean a broad peak.
Rough takeaway: broader peaks in the fitness-landscape are precisely those which require fixing fewer parameters. Fixing fewer parameters, while still achieving optimality, requires compressing all the information-required-to-achieve-optimality into those few parameters. The more compression, the broader the peak, and the more likely that a local search process will find it.
Large genomes have (at least) 2 kinds of costs. The first is the energy and other resources required to copy the genome whenever your cells divide. The existence of junk DNA suggests that this cost is not a limiting factor. The other cost is that a larger genome will have more mutations per generation. So maintaining that genome across time uses up more selection pressure. Junk DNA requires no maintenance, so it provides no evidence either way. Selection pressure cost could still be the reason why we don’t see more knowledge about the world being translated genetically.
A gene-level way of saying the same thing is that even a gene that provides an advantage may not survive if it takes up a lot of genome space, because it will be destroyed by the large number of mutations.
Good point, I wasn’t thinking about that mechanism.
However, I don’t think this creates an information bottleneck in the sense needed for the original claim in the post, because the marginal cost of storing more information in the genome does not increase via this mechanism as the amount-of-information-passed increases. Each gene just needs to offer a large enough fitness advantage to counter the noise on that gene; the requisite fitness advantage does not change depending on whether the organism currently has a hundred information-passing genes or a hundred thousand. It’s not really a “bottleneck” so much as a fixed price: the organism can pass any amount of information via the genome, so long as each base-pair contributes marginal fitness above some fixed level.
It does mean that individual genes can’t be too big, but it doesn’t say much about the number of information-passing genes (so long as separate genes have mostly-decoupled functions, which is indeed the case for the vast majority of gene pairs in practice).
Here’s the argument I’d give for this kind of bottleneck. I haven’t studied evolutionary genetics; maybe I’m thinking about it all wrong.
In the steady state, an average individual has n children in their life, and just one of those n makes it to the next generation. (Crediting a child 1⁄2 to each parent.) This gives log2(n) bits of error-correcting signal to prune deleterious mutations. If the genome length times the functional bits per base pair times the mutation rate is greater than that log2(n), then you’re losing functionality with every generation.
One way for a beneficial new mutation to get out of this bind is by reducing the mutation rate. Another is refactoring the same functionality into fewer bits, freeing up bits for something new. But generically a fitness advantage doesn’t seem to affect the argument that the signal from purifying selection gets shared by the whole genome.
most of the genome is junk. That’s pretty strong evidence that the size of the genome is not itself a taut constraint.
My guess is that this is a total misunderstanding of what’s meant by “genomic bottleneck”. The bottleneck isn’t the amount of information storage, it’s the fact that the genome can only program the mind in a very indirect, developmental way, so that it can install stuff like “be more interested in people” but not “here’s how to add numbers”.
That seems wrong, living creatures have lots of specific behaviors that are genetically programmed.
In fact I think both you and John are misunderstanding the bottleneck. The point isn’t that the genome is small, nor that it affects the mind indirectly. The point is that the mind doesn’t affect the genome. Living creatures don’t have the tech to encode their life experience into genes for the next generation.
I’ve appreciated this comment thread! My take is that you’re all talking about different relevant things. It may well be the case that there are multiple reasons why more skills and knowledge aren’t encoded in our genomes: a) it’s hard to get that information in (from parents’ brains), b) it’s hard to get that information out (to childrens’ brains), and c) having large genomes is costly. What I’m calling the genomic bottleneck is a combination of all of them (although I think John is probably right that c) is not the main reason).
What would falsify my claim about the genomic bottleneck is if the main reason there isn’t more information passed on via genomes is because d) doing so is not very useful. That seems pretty unlikely, but not entirely out of the picture. E.g. we know that evolution is able to give baby deer the skill of walking shortly after birth, so it seems like d) might be the best explanation of why humans can’t do that too. But deer presumably evolved that skill over a very long time period, whereas I’m more interested in rapid changes.
Do you think you can encode good flint-knapping technique genetically? I doubt that.
I think I agree with your point, and think it’s a more general and correct statement of the bottleneck; but, still, I think that genome does mainly affect the mind indirectly, and this is one of the constraints making it be the case that humans have lots of learning / generalizing capability. (This doesn’t just apply to humans. What are some stark examples of animals with hardwired complex behaviors? With a fairly high bar for “complex”, and a clear explanation of what is hardwired and how we know. Insects have some fairly complex behaviors, e.g. web building, ant-hill building, the tree-leaf nests of weaver ants, etc.; but IDK enough to rule out a combination of a little hardwiring, some emergence, and some learning. Lots of animals hunt after learning from their parents how to hunt. I think a lot of animals can walk right after being born? I think beavers in captivity will fruitlessly chew on wood, indicating that the wild phenotype is encoded by something simple like “enjoys chewing” (plus, learned desire for shelter), rather than “use wood for dam”.)
An operationalization of “the genome directly programs the mind” would be that things like [the motions employed in flint-knapping] can be hardwired by small numbers of mutations (and hence can be evolved given a few million relevant years). I think this isn’t true, but counterevidence would be interesting. Since the genome can’t feasibly directly encode behaviors, or at least can’t learn those quickly enough to keep up with a changing niche, the species instead evolves to learn behaviors on the fly via algorithms that generalize. If there were *either* mind-mind transfer, *or* direct programming of behavior by the genome, then higher frequency changes would be easier and there’d be less need for fluid intelligence. (In fact it’s sort of plausible to me (given my ignorance) that humans are imitation specialists and are less clever than Neanderthals were, since mind-mind transfer can replace intelligence.)
Some animal behaviours are certainly hardwired. There is the famous case of one bee species being immune to a pathogen because of a specific cleaning behaviour that is encoded by a single gene.
One important point that should be brought up in this context is sexual recombination.
if you have a part of a genome encoding a complex behaviour it can get reshuffled in the new generation. You would need some pretty powerful error correcting code to keep things working.
I disagree with this in an interesting way. (Not particularly central to the discussion, but since both Richard & Eliezer thought the quoted claim is basically-true, I figured I should comment on it.)
First, outside view evidence: most of the genome is junk. That’s pretty strong evidence that the size of the genome is not itself a taut constraint. If there evolutionary fitness gains to be had, in general, by passing more information via the genome, then we should expect that to have evolved already.
Second, inside view: overparameterized local search processes (including evolution and gradient descent on NNs) perform information compression by default. This is a technical idea that I haven’t written up properly yet, but as a quick sketch… suppose that I have a neural net with N parameters. It’s overparameterized, so there are many degrees of freedom in any optimum—i.e. there’s a whole optimal surface, not just an optimal point. Now suppose that I can build a near-perfect model of the training data by setting only M (< N) parameter-values; with these values, all the other parameters are screened off, so the remaining N-M parameters can take any values at all. (I’ll call the set of M parameter-values a “model”.) The smaller M, the larger N-M, and therefore the more possible parameter-values achieve optimality using this model. And the more possible parameter-values achieve optimality using the model, the more of the optimum-space this “model” fills. In practice, for something like evolution or gradient descent, this would mean a broad peak.
Rough takeaway: broader peaks in the fitness-landscape are precisely those which require fixing fewer parameters. Fixing fewer parameters, while still achieving optimality, requires compressing all the information-required-to-achieve-optimality into those few parameters. The more compression, the broader the peak, and the more likely that a local search process will find it.
Large genomes have (at least) 2 kinds of costs. The first is the energy and other resources required to copy the genome whenever your cells divide. The existence of junk DNA suggests that this cost is not a limiting factor. The other cost is that a larger genome will have more mutations per generation. So maintaining that genome across time uses up more selection pressure. Junk DNA requires no maintenance, so it provides no evidence either way. Selection pressure cost could still be the reason why we don’t see more knowledge about the world being translated genetically.
A gene-level way of saying the same thing is that even a gene that provides an advantage may not survive if it takes up a lot of genome space, because it will be destroyed by the large number of mutations.
Good point, I wasn’t thinking about that mechanism.
However, I don’t think this creates an information bottleneck in the sense needed for the original claim in the post, because the marginal cost of storing more information in the genome does not increase via this mechanism as the amount-of-information-passed increases. Each gene just needs to offer a large enough fitness advantage to counter the noise on that gene; the requisite fitness advantage does not change depending on whether the organism currently has a hundred information-passing genes or a hundred thousand. It’s not really a “bottleneck” so much as a fixed price: the organism can pass any amount of information via the genome, so long as each base-pair contributes marginal fitness above some fixed level.
It does mean that individual genes can’t be too big, but it doesn’t say much about the number of information-passing genes (so long as separate genes have mostly-decoupled functions, which is indeed the case for the vast majority of gene pairs in practice).
Here’s the argument I’d give for this kind of bottleneck. I haven’t studied evolutionary genetics; maybe I’m thinking about it all wrong.
In the steady state, an average individual has n children in their life, and just one of those n makes it to the next generation. (Crediting a child 1⁄2 to each parent.) This gives log2(n) bits of error-correcting signal to prune deleterious mutations. If the genome length times the functional bits per base pair times the mutation rate is greater than that log2(n), then you’re losing functionality with every generation.
One way for a beneficial new mutation to get out of this bind is by reducing the mutation rate. Another is refactoring the same functionality into fewer bits, freeing up bits for something new. But generically a fitness advantage doesn’t seem to affect the argument that the signal from purifying selection gets shared by the whole genome.
My guess is that this is a total misunderstanding of what’s meant by “genomic bottleneck”. The bottleneck isn’t the amount of information storage, it’s the fact that the genome can only program the mind in a very indirect, developmental way, so that it can install stuff like “be more interested in people” but not “here’s how to add numbers”.
That seems wrong, living creatures have lots of specific behaviors that are genetically programmed.
In fact I think both you and John are misunderstanding the bottleneck. The point isn’t that the genome is small, nor that it affects the mind indirectly. The point is that the mind doesn’t affect the genome. Living creatures don’t have the tech to encode their life experience into genes for the next generation.
I’ve appreciated this comment thread! My take is that you’re all talking about different relevant things. It may well be the case that there are multiple reasons why more skills and knowledge aren’t encoded in our genomes: a) it’s hard to get that information in (from parents’ brains), b) it’s hard to get that information out (to childrens’ brains), and c) having large genomes is costly. What I’m calling the genomic bottleneck is a combination of all of them (although I think John is probably right that c) is not the main reason).
What would falsify my claim about the genomic bottleneck is if the main reason there isn’t more information passed on via genomes is because d) doing so is not very useful. That seems pretty unlikely, but not entirely out of the picture. E.g. we know that evolution is able to give baby deer the skill of walking shortly after birth, so it seems like d) might be the best explanation of why humans can’t do that too. But deer presumably evolved that skill over a very long time period, whereas I’m more interested in rapid changes.
Do you think you can encode good flint-knapping technique genetically? I doubt that.
I think I agree with your point, and think it’s a more general and correct statement of the bottleneck; but, still, I think that genome does mainly affect the mind indirectly, and this is one of the constraints making it be the case that humans have lots of learning / generalizing capability. (This doesn’t just apply to humans. What are some stark examples of animals with hardwired complex behaviors? With a fairly high bar for “complex”, and a clear explanation of what is hardwired and how we know. Insects have some fairly complex behaviors, e.g. web building, ant-hill building, the tree-leaf nests of weaver ants, etc.; but IDK enough to rule out a combination of a little hardwiring, some emergence, and some learning. Lots of animals hunt after learning from their parents how to hunt. I think a lot of animals can walk right after being born? I think beavers in captivity will fruitlessly chew on wood, indicating that the wild phenotype is encoded by something simple like “enjoys chewing” (plus, learned desire for shelter), rather than “use wood for dam”.)
An operationalization of “the genome directly programs the mind” would be that things like [the motions employed in flint-knapping] can be hardwired by small numbers of mutations (and hence can be evolved given a few million relevant years). I think this isn’t true, but counterevidence would be interesting. Since the genome can’t feasibly directly encode behaviors, or at least can’t learn those quickly enough to keep up with a changing niche, the species instead evolves to learn behaviors on the fly via algorithms that generalize. If there were *either* mind-mind transfer, *or* direct programming of behavior by the genome, then higher frequency changes would be easier and there’d be less need for fluid intelligence. (In fact it’s sort of plausible to me (given my ignorance) that humans are imitation specialists and are less clever than Neanderthals were, since mind-mind transfer can replace intelligence.)
Some animal behaviours are certainly hardwired. There is the famous case of one bee species being immune to a pathogen because of a specific cleaning behaviour that is encoded by a single gene.
One important point that should be brought up in this context is sexual recombination.
if you have a part of a genome encoding a complex behaviour it can get reshuffled in the new generation. You would need some pretty powerful error correcting code to keep things working.