I don’t completely understand your point because I don’t have a calibration for your “slow” in “training an AI must be slow”. How slow is “slow”? Compared to what? (Leaving aside Solomonoff inductors and other incomputable things.)
Do you consider the usual case that “a toddler requires fewer examples” as a reference for “not slow”? If so: human DNA is < 1GB, so humans get at most 1GB of free knowledge as inductive bias. Does your argument for “AI slow” then rely on us not getting to that <1GB of stuff to preconfigure in a ML system? If not so (humans slow too): do you think humans are a ceiling, or close to one, on data efficiency?
You’re right that my points lack a certain rigor. I don’t think there is a rigorous answer to questions like “what does slow mean?”.
However, there is a recurring theme I’ve seen in discussions about AI where people express incredulity about neural networks as a method for AGI since they require so much “more data” than humans to train. My argument was merely that we should expect things to take a lot of data, and situations where they don’t are illusory. Maybe that’s less common in this space, so it I should have framed it differently. But I wrote this mostly to put it out there and get people’s thoughts.
Also, I see your point about DNA only accounting for 1GB. I wasn’t aware it was so low. I think it’s interesting and suggests the possibility of smaller learning systems than I envisioned, but that’s as much a question about compression as anything else. Don’t forget that that DNA still needs to be “uncompressed” into a human, and at least some of that process is using information stored in the previous generation of human. Admittedly, it’s not clear how much that last part accounts for, but there is evidence that part of a baby’s development is determined by the biological state of the mother.
But I guess I would say my argument does rely on us not getting that <1GB of stuff, with the caveat that that 1GB is super highly compressed through a process that takes a very complex system to uncompress.
I should add as well that I definitely don’t believe that LLMs are remotely efficient, and I wouldn’t necessarily be surprised if humans are as close to the maximum on data efficiency as possible. I wouldn’t be surprised if they weren’t, either. But we were built over millions (billions?) of years under conditions that put a very high price tag on inefficiency, so it seems reasonable to believe our data efficiency is at least at some type of local minima.
EDIT: Another way to phrase the point about DNA: You need to account not just for the storage size of the DNA, but also the Kolmogrov complexity of turning that into a human. No idea if that adds a lot to its size, though.
1GB for DNA is a lower bound. That’s how much it takes to store the abstract base pair representation. There’s lots of other information you’d need to actually build a human and a lot of it is common to all life. Like, DNA spends most of its time not in the neat little X shapes that happen during reproduction, but in coiled up little tangles. A lot of the information is stored in the 3D shape and in the other regulatory machinery attached to the chromosomes.
If all you had was a human genome, the best you could do would be to do a lot of simulation to reconstruct all the other stuff. Probably doable, but would require a lot of “relearning.”
The brain also uses DNA for storing information on the form of methylation patterns in individual neurons.
I expect that the mother does not add much to the DNA as information; so yes it’s complex and necessary, but I think you have to count almost only the size of DNA for inductive bias. That said, this is a gut guess!
However, there is a recurring theme I’ve seen in discussions about AI where people express incredulity about neural networks as a method for AGI since they require so much “more data” than humans to train. My argument was merely that we should expect things to take a lot of data, and situations where they don’t are illusory.
Yeah I got this, I have the same impression. The way I think about the topic is: “The NN requires tons of data to learn human language because it’s a totally alien mind, while humans have produced themselves their language, so it’s tautologically adapted to their base architecture, you learn it easily only because it’s designed to be learned by you”.
But after encountering the DNA size argument myself a while ago, I started doubting this framework. It may be possible to do much, much better that what we do now.
Yeah, I agree that it’s a surprising fact requiring a bit of updating on my end. But I think the compression point probably matters more than you would think, and I’m finding myself more convinced the more I think about it. A lot of processing goes into turning that 1GB into a brain, and that processing may not be highly reducible. That’s sort of what I was getting at, and I’m not totally sure the complexity of that process wouldn’t add up to a lot more than 1GB.
It’s tempting to think of DNA as sufficiently encoding a human, but (speculatively) it may make more sense to think of DNA only as the input to a very large function which outputs a human. It seems strange, but it’s not like anyone’s ever built a human (or any other organism) in a lab from DNA alone; it’s definitely possible that there’s a huge amount of information stored in the processes of a living human which isn’t sufficiently encoded just by DNA.
You don’t even have to zoom out to things like organs or the brain. Just knowing which bases match to which amino acids is an (admittedly simple) example of processing that exists outside of the DNA encoding itself.
Even if you include a very generous epigenetic and womb-environmental component 9x bigger then the DNA component, any possible human baby at birth would need less then 10 GB to describe them completely with DNA levels of compression.
A human adult at age 25 would probably need a lot more to cover all possible development scenarios, but even then I can’t see it being more then 1000x, so 10TB should be enough.
For reference Windows Server 2016 supports 24 TB of RAM, and many petabytes of attached storage.
I think you’re broadly right, but I think it’s worth mentioning that DNA is a probabilistic compression (evidence: differences in identical twins), so it gets weird when you talk about compressing an adult at age 25 - what is probabilistic compression at that point?
But I think you’ve mostly convinced me. Whatever it takes to “encode” a human, it’s possible to compress it to be something very small.
A minor nitpick, DNA, the encoding concept, is not probabilistic, it’s everything surrounding such as the packaging, 3D shape, epigenes, etc., plus random mutations, transcription errors, etc., that causes identical twins to deviate.
Of course it is so compact because it doesn’t bother spending many ‘bits’ on ancilliary capabilities to correct operating errors.
But it’s at least theoretically possible for it to be deterministic under ideal conditions.
To that first sentence, I don’t want to get lost in semantics here. My specific statement is that the process that takes DNA into a human is probabilistic with respect to the DNA sequence alone. Add in all that other stuff, and maybe at some point it becomes deterministic, but at that point you are no longer discussing the <1GB that makes DNA. If you wanted to be truly deterministic, especially up to the age of 25, I seriously doubt it could be done in less than millions of petabytes, because there are such a huge number of miniscule variations in conditions and I suspect human development is a highly chaotic process.
As you said, though, we’re at the point of minor nitpicks here. It doesn’t have to be a deterministic encoding for your broader points to stand.
Perhaps I phrased it poorly, let me put it this way.
If super-advanced aliens suddenly showed up tomorrow and gave us the near-physically-perfectly technology, machines, techniques, etc., we could feasibly have a fully deterministic, down to the cell level at least, encoding of any possible individual human stored in a box of hard drives or less.
In practical terms I can’t even begin to imagine the technology needed to reliably and repeatably capture a ‘snapshot’ of a living, breathing, human’s cellular state, but there’s no equivalent of a light speed barrier preventing it.
Total number of possible permutations an adult human brain could be in and still remain conscious, over and above that of a baby’s. The most extreme edge cases would be something like a Phineas Gage, where a ~1 inch diameter iron rod was rammed through a frontal lobe and he still could walk around.
So fill in the difference with guesstimation
I doubt there’s literally 1000x more permutations, since there’s already a huge range of possible babies, but I chose it anyways as a nice round number.
I don’t completely understand your point because I don’t have a calibration for your “slow” in “training an AI must be slow”. How slow is “slow”? Compared to what? (Leaving aside Solomonoff inductors and other incomputable things.)
Do you consider the usual case that “a toddler requires fewer examples” as a reference for “not slow”? If so: human DNA is < 1GB, so humans get at most 1GB of free knowledge as inductive bias. Does your argument for “AI slow” then rely on us not getting to that <1GB of stuff to preconfigure in a ML system? If not so (humans slow too): do you think humans are a ceiling, or close to one, on data efficiency?
You’re right that my points lack a certain rigor. I don’t think there is a rigorous answer to questions like “what does slow mean?”.
However, there is a recurring theme I’ve seen in discussions about AI where people express incredulity about neural networks as a method for AGI since they require so much “more data” than humans to train. My argument was merely that we should expect things to take a lot of data, and situations where they don’t are illusory. Maybe that’s less common in this space, so it I should have framed it differently. But I wrote this mostly to put it out there and get people’s thoughts.
Also, I see your point about DNA only accounting for 1GB. I wasn’t aware it was so low. I think it’s interesting and suggests the possibility of smaller learning systems than I envisioned, but that’s as much a question about compression as anything else. Don’t forget that that DNA still needs to be “uncompressed” into a human, and at least some of that process is using information stored in the previous generation of human. Admittedly, it’s not clear how much that last part accounts for, but there is evidence that part of a baby’s development is determined by the biological state of the mother.
But I guess I would say my argument does rely on us not getting that <1GB of stuff, with the caveat that that 1GB is super highly compressed through a process that takes a very complex system to uncompress.
I should add as well that I definitely don’t believe that LLMs are remotely efficient, and I wouldn’t necessarily be surprised if humans are as close to the maximum on data efficiency as possible. I wouldn’t be surprised if they weren’t, either. But we were built over millions (billions?) of years under conditions that put a very high price tag on inefficiency, so it seems reasonable to believe our data efficiency is at least at some type of local minima.
EDIT: Another way to phrase the point about DNA: You need to account not just for the storage size of the DNA, but also the Kolmogrov complexity of turning that into a human. No idea if that adds a lot to its size, though.
1GB for DNA is a lower bound. That’s how much it takes to store the abstract base pair representation. There’s lots of other information you’d need to actually build a human and a lot of it is common to all life. Like, DNA spends most of its time not in the neat little X shapes that happen during reproduction, but in coiled up little tangles. A lot of the information is stored in the 3D shape and in the other regulatory machinery attached to the chromosomes.
If all you had was a human genome, the best you could do would be to do a lot of simulation to reconstruct all the other stuff. Probably doable, but would require a lot of “relearning.”
The brain also uses DNA for storing information on the form of methylation patterns in individual neurons.
I expect that the mother does not add much to the DNA as information; so yes it’s complex and necessary, but I think you have to count almost only the size of DNA for inductive bias. That said, this is a gut guess!
Yeah I got this, I have the same impression. The way I think about the topic is: “The NN requires tons of data to learn human language because it’s a totally alien mind, while humans have produced themselves their language, so it’s tautologically adapted to their base architecture, you learn it easily only because it’s designed to be learned by you”.
But after encountering the DNA size argument myself a while ago, I started doubting this framework. It may be possible to do much, much better that what we do now.
Yeah, I agree that it’s a surprising fact requiring a bit of updating on my end. But I think the compression point probably matters more than you would think, and I’m finding myself more convinced the more I think about it. A lot of processing goes into turning that 1GB into a brain, and that processing may not be highly reducible. That’s sort of what I was getting at, and I’m not totally sure the complexity of that process wouldn’t add up to a lot more than 1GB.
It’s tempting to think of DNA as sufficiently encoding a human, but (speculatively) it may make more sense to think of DNA only as the input to a very large function which outputs a human. It seems strange, but it’s not like anyone’s ever built a human (or any other organism) in a lab from DNA alone; it’s definitely possible that there’s a huge amount of information stored in the processes of a living human which isn’t sufficiently encoded just by DNA.
You don’t even have to zoom out to things like organs or the brain. Just knowing which bases match to which amino acids is an (admittedly simple) example of processing that exists outside of the DNA encoding itself.
Even if you include a very generous epigenetic and womb-environmental component 9x bigger then the DNA component, any possible human baby at birth would need less then 10 GB to describe them completely with DNA levels of compression.
A human adult at age 25 would probably need a lot more to cover all possible development scenarios, but even then I can’t see it being more then 1000x, so 10TB should be enough.
For reference Windows Server 2016 supports 24 TB of RAM, and many petabytes of attached storage.
I think you’re broadly right, but I think it’s worth mentioning that DNA is a probabilistic compression (evidence: differences in identical twins), so it gets weird when you talk about compressing an adult at age 25 - what is probabilistic compression at that point?
But I think you’ve mostly convinced me. Whatever it takes to “encode” a human, it’s possible to compress it to be something very small.
A minor nitpick, DNA, the encoding concept, is not probabilistic, it’s everything surrounding such as the packaging, 3D shape, epigenes, etc., plus random mutations, transcription errors, etc., that causes identical twins to deviate.
Of course it is so compact because it doesn’t bother spending many ‘bits’ on ancilliary capabilities to correct operating errors.
But it’s at least theoretically possible for it to be deterministic under ideal conditions.
To that first sentence, I don’t want to get lost in semantics here. My specific statement is that the process that takes DNA into a human is probabilistic with respect to the DNA sequence alone. Add in all that other stuff, and maybe at some point it becomes deterministic, but at that point you are no longer discussing the <1GB that makes DNA. If you wanted to be truly deterministic, especially up to the age of 25, I seriously doubt it could be done in less than millions of petabytes, because there are such a huge number of miniscule variations in conditions and I suspect human development is a highly chaotic process.
As you said, though, we’re at the point of minor nitpicks here. It doesn’t have to be a deterministic encoding for your broader points to stand.
Perhaps I phrased it poorly, let me put it this way.
If super-advanced aliens suddenly showed up tomorrow and gave us the near-physically-perfectly technology, machines, techniques, etc., we could feasibly have a fully deterministic, down to the cell level at least, encoding of any possible individual human stored in a box of hard drives or less.
In practical terms I can’t even begin to imagine the technology needed to reliably and repeatably capture a ‘snapshot’ of a living, breathing, human’s cellular state, but there’s no equivalent of a light speed barrier preventing it.
How did you estimate the number of possible development scenarios till the age 25?
Total number of possible permutations an adult human brain could be in and still remain conscious, over and above that of a baby’s. The most extreme edge cases would be something like a Phineas Gage, where a ~1 inch diameter iron rod was rammed through a frontal lobe and he still could walk around.
So fill in the difference with guesstimation
I doubt there’s literally 1000x more permutations, since there’s already a huge range of possible babies, but I chose it anyways as a nice round number.