I think it’s more fair to say humans were “trained” over millions of years of transfer learning, and an individual human is fine tuned using much less data than Chinchilla.
Is that fair to say? How much kolmogorov complexity can be encoded by evolution at a maximum, considering that all information transferred through evolution must be encoded in a single (stem) cell? Especially when we consider how genetically similar we are to beings which don’t even have brains, I have trouble imagining that the amount of “training data” encoded by evolution is very large.
One thing to note about Kolmogorov complexity is that it is uncomputable. There is no possible algorithm that, given finite sequence as input, produces an algorithm of minimum length that can reproduce that sequence. Just because something has a Kolmogorov complexity of (say) a few hundred million bits does not at all mean that it can be found by training anything on a few hundred million, or even a few hundred trillion, bits of data.
I don’t see the problem. Your learning algorithm doesn’t have to be “very” complicated. It has to work. Machine learning models don’t consist of million lines of code. I do see the problem where one might expect evolution not to be very good at doing that compression, but I find the argument that there would actually be lots of bits needed very unconvincing.
A lot of the human genome does biochemical stuff like ATP synthesis. These genes, we share with bananas. A fair bit goes into hands, etc. The number of genes needed to encode the human brain is fairly small. The file size of GPT3 code is also small.
The size of the training data for evolution is immense, even if the number of parameters is not nearly so large. However, those parameters are not equivalent to ML parameters. They’re a mix of software architecture, hardware design, hyperparameters, and probably also some initial patterns of parameters as well. It doesn’t mean that you can get the same results for much less data by training some fixed design.
I think humans and current deep learning models are running sufficiently different algorithms that the scaling curves of one don’t apply to the other. This needn’t be a huge difference. Convolutional nets are more data efficient than basic dense nets.
I think it’s more fair to say humans were “trained” over millions of years of transfer learning, and an individual human is fine tuned using much less data than Chinchilla.
Is that fair to say? How much kolmogorov complexity can be encoded by evolution at a maximum, considering that all information transferred through evolution must be encoded in a single (stem) cell? Especially when we consider how genetically similar we are to beings which don’t even have brains, I have trouble imagining that the amount of “training data” encoded by evolution is very large.
One thing to note about Kolmogorov complexity is that it is uncomputable. There is no possible algorithm that, given finite sequence as input, produces an algorithm of minimum length that can reproduce that sequence. Just because something has a Kolmogorov complexity of (say) a few hundred million bits does not at all mean that it can be found by training anything on a few hundred million, or even a few hundred trillion, bits of data.
I don’t see the problem. Your learning algorithm doesn’t have to be “very” complicated. It has to work. Machine learning models don’t consist of million lines of code. I do see the problem where one might expect evolution not to be very good at doing that compression, but I find the argument that there would actually be lots of bits needed very unconvincing.
Last time I checked, you could not teach a banana basic arithmetic. This works for most humans, so obviously evolution did lots of leg work there.
A lot of the human genome does biochemical stuff like ATP synthesis. These genes, we share with bananas. A fair bit goes into hands, etc. The number of genes needed to encode the human brain is fairly small. The file size of GPT3 code is also small.
The size of the training data for evolution is immense, even if the number of parameters is not nearly so large. However, those parameters are not equivalent to ML parameters. They’re a mix of software architecture, hardware design, hyperparameters, and probably also some initial patterns of parameters as well. It doesn’t mean that you can get the same results for much less data by training some fixed design.
I think humans and current deep learning models are running sufficiently different algorithms that the scaling curves of one don’t apply to the other. This needn’t be a huge difference. Convolutional nets are more data efficient than basic dense nets.