I expect that the mother does not add much to the DNA as information; so yes it’s complex and necessary, but I think you have to count almost only the size of DNA for inductive bias. That said, this is a gut guess!
However, there is a recurring theme I’ve seen in discussions about AI where people express incredulity about neural networks as a method for AGI since they require so much “more data” than humans to train. My argument was merely that we should expect things to take a lot of data, and situations where they don’t are illusory.
Yeah I got this, I have the same impression. The way I think about the topic is: “The NN requires tons of data to learn human language because it’s a totally alien mind, while humans have produced themselves their language, so it’s tautologically adapted to their base architecture, you learn it easily only because it’s designed to be learned by you”.
But after encountering the DNA size argument myself a while ago, I started doubting this framework. It may be possible to do much, much better that what we do now.
Yeah, I agree that it’s a surprising fact requiring a bit of updating on my end. But I think the compression point probably matters more than you would think, and I’m finding myself more convinced the more I think about it. A lot of processing goes into turning that 1GB into a brain, and that processing may not be highly reducible. That’s sort of what I was getting at, and I’m not totally sure the complexity of that process wouldn’t add up to a lot more than 1GB.
It’s tempting to think of DNA as sufficiently encoding a human, but (speculatively) it may make more sense to think of DNA only as the input to a very large function which outputs a human. It seems strange, but it’s not like anyone’s ever built a human (or any other organism) in a lab from DNA alone; it’s definitely possible that there’s a huge amount of information stored in the processes of a living human which isn’t sufficiently encoded just by DNA.
You don’t even have to zoom out to things like organs or the brain. Just knowing which bases match to which amino acids is an (admittedly simple) example of processing that exists outside of the DNA encoding itself.
Even if you include a very generous epigenetic and womb-environmental component 9x bigger then the DNA component, any possible human baby at birth would need less then 10 GB to describe them completely with DNA levels of compression.
A human adult at age 25 would probably need a lot more to cover all possible development scenarios, but even then I can’t see it being more then 1000x, so 10TB should be enough.
For reference Windows Server 2016 supports 24 TB of RAM, and many petabytes of attached storage.
I think you’re broadly right, but I think it’s worth mentioning that DNA is a probabilistic compression (evidence: differences in identical twins), so it gets weird when you talk about compressing an adult at age 25 - what is probabilistic compression at that point?
But I think you’ve mostly convinced me. Whatever it takes to “encode” a human, it’s possible to compress it to be something very small.
A minor nitpick, DNA, the encoding concept, is not probabilistic, it’s everything surrounding such as the packaging, 3D shape, epigenes, etc., plus random mutations, transcription errors, etc., that causes identical twins to deviate.
Of course it is so compact because it doesn’t bother spending many ‘bits’ on ancilliary capabilities to correct operating errors.
But it’s at least theoretically possible for it to be deterministic under ideal conditions.
To that first sentence, I don’t want to get lost in semantics here. My specific statement is that the process that takes DNA into a human is probabilistic with respect to the DNA sequence alone. Add in all that other stuff, and maybe at some point it becomes deterministic, but at that point you are no longer discussing the <1GB that makes DNA. If you wanted to be truly deterministic, especially up to the age of 25, I seriously doubt it could be done in less than millions of petabytes, because there are such a huge number of miniscule variations in conditions and I suspect human development is a highly chaotic process.
As you said, though, we’re at the point of minor nitpicks here. It doesn’t have to be a deterministic encoding for your broader points to stand.
Perhaps I phrased it poorly, let me put it this way.
If super-advanced aliens suddenly showed up tomorrow and gave us the near-physically-perfectly technology, machines, techniques, etc., we could feasibly have a fully deterministic, down to the cell level at least, encoding of any possible individual human stored in a box of hard drives or less.
In practical terms I can’t even begin to imagine the technology needed to reliably and repeatably capture a ‘snapshot’ of a living, breathing, human’s cellular state, but there’s no equivalent of a light speed barrier preventing it.
Total number of possible permutations an adult human brain could be in and still remain conscious, over and above that of a baby’s. The most extreme edge cases would be something like a Phineas Gage, where a ~1 inch diameter iron rod was rammed through a frontal lobe and he still could walk around.
So fill in the difference with guesstimation
I doubt there’s literally 1000x more permutations, since there’s already a huge range of possible babies, but I chose it anyways as a nice round number.
I expect that the mother does not add much to the DNA as information; so yes it’s complex and necessary, but I think you have to count almost only the size of DNA for inductive bias. That said, this is a gut guess!
Yeah I got this, I have the same impression. The way I think about the topic is: “The NN requires tons of data to learn human language because it’s a totally alien mind, while humans have produced themselves their language, so it’s tautologically adapted to their base architecture, you learn it easily only because it’s designed to be learned by you”.
But after encountering the DNA size argument myself a while ago, I started doubting this framework. It may be possible to do much, much better that what we do now.
Yeah, I agree that it’s a surprising fact requiring a bit of updating on my end. But I think the compression point probably matters more than you would think, and I’m finding myself more convinced the more I think about it. A lot of processing goes into turning that 1GB into a brain, and that processing may not be highly reducible. That’s sort of what I was getting at, and I’m not totally sure the complexity of that process wouldn’t add up to a lot more than 1GB.
It’s tempting to think of DNA as sufficiently encoding a human, but (speculatively) it may make more sense to think of DNA only as the input to a very large function which outputs a human. It seems strange, but it’s not like anyone’s ever built a human (or any other organism) in a lab from DNA alone; it’s definitely possible that there’s a huge amount of information stored in the processes of a living human which isn’t sufficiently encoded just by DNA.
You don’t even have to zoom out to things like organs or the brain. Just knowing which bases match to which amino acids is an (admittedly simple) example of processing that exists outside of the DNA encoding itself.
Even if you include a very generous epigenetic and womb-environmental component 9x bigger then the DNA component, any possible human baby at birth would need less then 10 GB to describe them completely with DNA levels of compression.
A human adult at age 25 would probably need a lot more to cover all possible development scenarios, but even then I can’t see it being more then 1000x, so 10TB should be enough.
For reference Windows Server 2016 supports 24 TB of RAM, and many petabytes of attached storage.
I think you’re broadly right, but I think it’s worth mentioning that DNA is a probabilistic compression (evidence: differences in identical twins), so it gets weird when you talk about compressing an adult at age 25 - what is probabilistic compression at that point?
But I think you’ve mostly convinced me. Whatever it takes to “encode” a human, it’s possible to compress it to be something very small.
A minor nitpick, DNA, the encoding concept, is not probabilistic, it’s everything surrounding such as the packaging, 3D shape, epigenes, etc., plus random mutations, transcription errors, etc., that causes identical twins to deviate.
Of course it is so compact because it doesn’t bother spending many ‘bits’ on ancilliary capabilities to correct operating errors.
But it’s at least theoretically possible for it to be deterministic under ideal conditions.
To that first sentence, I don’t want to get lost in semantics here. My specific statement is that the process that takes DNA into a human is probabilistic with respect to the DNA sequence alone. Add in all that other stuff, and maybe at some point it becomes deterministic, but at that point you are no longer discussing the <1GB that makes DNA. If you wanted to be truly deterministic, especially up to the age of 25, I seriously doubt it could be done in less than millions of petabytes, because there are such a huge number of miniscule variations in conditions and I suspect human development is a highly chaotic process.
As you said, though, we’re at the point of minor nitpicks here. It doesn’t have to be a deterministic encoding for your broader points to stand.
Perhaps I phrased it poorly, let me put it this way.
If super-advanced aliens suddenly showed up tomorrow and gave us the near-physically-perfectly technology, machines, techniques, etc., we could feasibly have a fully deterministic, down to the cell level at least, encoding of any possible individual human stored in a box of hard drives or less.
In practical terms I can’t even begin to imagine the technology needed to reliably and repeatably capture a ‘snapshot’ of a living, breathing, human’s cellular state, but there’s no equivalent of a light speed barrier preventing it.
How did you estimate the number of possible development scenarios till the age 25?
Total number of possible permutations an adult human brain could be in and still remain conscious, over and above that of a baby’s. The most extreme edge cases would be something like a Phineas Gage, where a ~1 inch diameter iron rod was rammed through a frontal lobe and he still could walk around.
So fill in the difference with guesstimation
I doubt there’s literally 1000x more permutations, since there’s already a huge range of possible babies, but I chose it anyways as a nice round number.