I think it’s more fair to say humans were “trained” over millions of years of transfer learning, and an individual human is fine tuned using much less data than Chinchilla.
Is that fair to say? How much kolmogorov complexity can be encoded by evolution at a maximum, considering that all information transferred through evolution must be encoded in a single (stem) cell? Especially when we consider how genetically similar we are to beings which don’t even have brains, I have trouble imagining that the amount of “training data” encoded by evolution is very large.
One thing to note about Kolmogorov complexity is that it is uncomputable. There is no possible algorithm that, given finite sequence as input, produces an algorithm of minimum length that can reproduce that sequence. Just because something has a Kolmogorov complexity of (say) a few hundred million bits does not at all mean that it can be found by training anything on a few hundred million, or even a few hundred trillion, bits of data.
I don’t see the problem. Your learning algorithm doesn’t have to be “very” complicated. It has to work. Machine learning models don’t consist of million lines of code. I do see the problem where one might expect evolution not to be very good at doing that compression, but I find the argument that there would actually be lots of bits needed very unconvincing.
A lot of the human genome does biochemical stuff like ATP synthesis. These genes, we share with bananas. A fair bit goes into hands, etc. The number of genes needed to encode the human brain is fairly small. The file size of GPT3 code is also small.
The size of the training data for evolution is immense, even if the number of parameters is not nearly so large. However, those parameters are not equivalent to ML parameters. They’re a mix of software architecture, hardware design, hyperparameters, and probably also some initial patterns of parameters as well. It doesn’t mean that you can get the same results for much less data by training some fixed design.
I think humans and current deep learning models are running sufficiently different algorithms that the scaling curves of one don’t apply to the other. This needn’t be a huge difference. Convolutional nets are more data efficient than basic dense nets.
AIXI, trained on all wikipedia, would be vastly superhuman and terrifying. I don’t think we are anywhere close to fundamental data limits. I think we might be closer to the limits of current neural network technology.
Sure, video files are bigger than text files.
Yes, self play allows for limitless amounts of data, which is why AI can absolutely be crazy good at go.
My model has AI that are pretty good, potentially superhuman, at every task where we can give the AI a huge pile of relevant data. This does include generating short clickbait videos. This doesn’t include working out advances in fundamental physics, or designing a fusion reactor, or making breakthroughs in AI research. I think AIXI trained on wikipedia would be able to do all those things. But I don’t think the next neural networks will be able to.
Physics can be simulated, sure. When a human does a simulation, they are trying to find out useful information. When an neural net is set, they are trying to game the system. The human is actively optimizing for regions where the simulation is accurate, and if needed, will adjust the parameters of the simulation to improve accuracy. The AI is actively trying to find a design that breaks your simulation. Designing a simulation broad enough to contain the width of systems a human engineer might consider, and accurate enough that a solution in the simulation is likely to be a solution in reality, and efficient enough that the AI can blindly thrash towards a solution with millions of trials, that’s hard.
Yes software can be simulated. Software is a discrete domain. One small modification from highly functioning code usually doesn’t work at all. Training a state of the art AI takes a lot of compute. Evolution has been in a position where it was optimizing for intelligence many times. Sure, sometimes it produces genuine intelligence, often it produces a pile of hard coded special case hacks that kind of work. Telling if you have an AI breakthrough is hard. Performance on any particular benchmark can be gamed with a heath robinson of special cases.
Existing Quanum field theory, can kind of be simulated, on one proton at huge computational cost, and using a bunch of computational speed up tricks specialized to those particular equations.
Suppose the AI proposes an equation of its new multistring harmonic theory. It would take a team of humans years to figure out a computationally tractable simulation. But ignore that and magically simulate it anyway. You now have a simulation of multistring harmonic theory. You set it up with a random starting position and simulate. Lets say you get a proton. How do you recognise that the complicated combination of knots is indeed a proton? You can’t measure its mass, mass isn’t fundamental in multistring harmonic theory. Mass is just the average rate a particle emits massules divided by its intrauniverse bushiness coefficient. Or maybe the random thing you land on is a magnetic monopole, or some other exotic thing we never new existed.
Assume you have an AI that could get 100% on every Putnam test, do you think it would be reasonable or not to assume such an AI would also display superhuman performance at solving the Yang-Mills Mass Gap?
Producing machine verifiable formal proofs is an activity somewhat amenable to self play. To the extent that some parts of physics are reducible to ZFC oracle queries, maybe AI can solve those.
To do something other than produce ZFC proofs, the AI must be learning what real in practice maths looks like. To do this, it needs large amounts of human generated mathematical content. It is plausible that the translation from formal maths to human maths is fairly simple, and that there is enough maths papers available for the AI to roughly learn it.
The Putnam archive consists of 12 questions X 20 years=240 questions, spread over many fields of maths. This is not big data. You can’t train a neural net to do much with just 240 examples. If aliens gave us a billion similar questions (with answers), I don’t doubt we could make an AI that scores 100% on putnam. Still it is plausible that enough maths could be scraped together to roughly learn the relation from ZFC to human maths. And such an AI could be fine tuned on some dataset similar to Putnam, and then do well in putnam. (Especially if the examiner is forgiving of strange formulaic phrasings)
The Putnam problems are a unwooly. I suspect such an AI couldn’t take in the web page you linked, and produce a publishable paper solving the yang mills mass gap. Given a physicist who understood the question, and was also prepared to dive into ZFC (or lean or some other formal system) formulae, then I suspect such an AI could be useful. If the physicist doesn’t look at the ZFC, but is doing a fair bit of hand holding, they probably succeed. I am assuming the AI is just magic at ZFC, that’s self play. The thing I think is hard to learn is the link from the woolly gesturing to the ZFC. So with a physicist there to be more unambiguous about the question, and to cherrypick and paste together the answers, and generally polish a mishmash of theorems into a more flowing narrative, that would work. I’m not sure how much hand holding would be needed. I’m not sure you get your Putnam bot to work in the first place.
Sure, but the post sets up a hypothetical, so prompts its development, not denial, no matter how implausible.
I think scaling up generation of data that’s actually useful for more than robustness in language/multimodal models is the only remaining milestone before AGIs. Learn on your effortful multistep thoughts about naturally sourced data, not just on the data itself. Alignment of this generated data is what makes or breaks the future. The current experiments are much easier, because the naturally sourced data is about as aligned as it gets, you just need to use it correctly, while generated data could systematically shift the targets of generalization.
I strongly doubt we live in a data-limited AGI timeline
Humans are trained using much less data than Chinchilla
We haven’t even begun to exploit forms of media other than text (Youtube alone is >2OOM bigger)
self-play allows for literally limitless amounts of data
regularization methods mean data constraints aren’t nearly as important as claimed
In the domains where we have exhausted available data, ML models are already weakly superhuman
… for things you can efficiently simulate/efficiently practice on.
I think it’s more fair to say humans were “trained” over millions of years of transfer learning, and an individual human is fine tuned using much less data than Chinchilla.
Is that fair to say? How much kolmogorov complexity can be encoded by evolution at a maximum, considering that all information transferred through evolution must be encoded in a single (stem) cell? Especially when we consider how genetically similar we are to beings which don’t even have brains, I have trouble imagining that the amount of “training data” encoded by evolution is very large.
One thing to note about Kolmogorov complexity is that it is uncomputable. There is no possible algorithm that, given finite sequence as input, produces an algorithm of minimum length that can reproduce that sequence. Just because something has a Kolmogorov complexity of (say) a few hundred million bits does not at all mean that it can be found by training anything on a few hundred million, or even a few hundred trillion, bits of data.
I don’t see the problem. Your learning algorithm doesn’t have to be “very” complicated. It has to work. Machine learning models don’t consist of million lines of code. I do see the problem where one might expect evolution not to be very good at doing that compression, but I find the argument that there would actually be lots of bits needed very unconvincing.
Last time I checked, you could not teach a banana basic arithmetic. This works for most humans, so obviously evolution did lots of leg work there.
A lot of the human genome does biochemical stuff like ATP synthesis. These genes, we share with bananas. A fair bit goes into hands, etc. The number of genes needed to encode the human brain is fairly small. The file size of GPT3 code is also small.
The size of the training data for evolution is immense, even if the number of parameters is not nearly so large. However, those parameters are not equivalent to ML parameters. They’re a mix of software architecture, hardware design, hyperparameters, and probably also some initial patterns of parameters as well. It doesn’t mean that you can get the same results for much less data by training some fixed design.
I think humans and current deep learning models are running sufficiently different algorithms that the scaling curves of one don’t apply to the other. This needn’t be a huge difference. Convolutional nets are more data efficient than basic dense nets.
AIXI, trained on all wikipedia, would be vastly superhuman and terrifying. I don’t think we are anywhere close to fundamental data limits. I think we might be closer to the limits of current neural network technology.
Sure, video files are bigger than text files.
Yes, self play allows for limitless amounts of data, which is why AI can absolutely be crazy good at go.
My model has AI that are pretty good, potentially superhuman, at every task where we can give the AI a huge pile of relevant data. This does include generating short clickbait videos. This doesn’t include working out advances in fundamental physics, or designing a fusion reactor, or making breakthroughs in AI research. I think AIXI trained on wikipedia would be able to do all those things. But I don’t think the next neural networks will be able to.
Why don’t all of these fall into the self-play category? Physics, software and fusion reactors can all be simulated.
I would be mildly surprised if a sufficiently large language model couldn’t solve all of Project Euler+Putnam+MATH dataset.
Physics can be simulated, sure. When a human does a simulation, they are trying to find out useful information. When an neural net is set, they are trying to game the system. The human is actively optimizing for regions where the simulation is accurate, and if needed, will adjust the parameters of the simulation to improve accuracy. The AI is actively trying to find a design that breaks your simulation. Designing a simulation broad enough to contain the width of systems a human engineer might consider, and accurate enough that a solution in the simulation is likely to be a solution in reality, and efficient enough that the AI can blindly thrash towards a solution with millions of trials, that’s hard.
Yes software can be simulated. Software is a discrete domain. One small modification from highly functioning code usually doesn’t work at all. Training a state of the art AI takes a lot of compute. Evolution has been in a position where it was optimizing for intelligence many times. Sure, sometimes it produces genuine intelligence, often it produces a pile of hard coded special case hacks that kind of work. Telling if you have an AI breakthrough is hard. Performance on any particular benchmark can be gamed with a heath robinson of special cases.
Existing Quanum field theory, can kind of be simulated, on one proton at huge computational cost, and using a bunch of computational speed up tricks specialized to those particular equations.
Suppose the AI proposes an equation of its new multistring harmonic theory. It would take a team of humans years to figure out a computationally tractable simulation. But ignore that and magically simulate it anyway. You now have a simulation of multistring harmonic theory. You set it up with a random starting position and simulate. Lets say you get a proton. How do you recognise that the complicated combination of knots is indeed a proton? You can’t measure its mass, mass isn’t fundamental in multistring harmonic theory. Mass is just the average rate a particle emits massules divided by its intrauniverse bushiness coefficient. Or maybe the random thing you land on is a magnetic monopole, or some other exotic thing we never new existed.
Let’s take a concrete example.
Assume you have an AI that could get 100% on every Putnam test, do you think it would be reasonable or not to assume such an AI would also display superhuman performance at solving the Yang-Mills Mass Gap?
Producing machine verifiable formal proofs is an activity somewhat amenable to self play. To the extent that some parts of physics are reducible to ZFC oracle queries, maybe AI can solve those.
To do something other than produce ZFC proofs, the AI must be learning what real in practice maths looks like. To do this, it needs large amounts of human generated mathematical content. It is plausible that the translation from formal maths to human maths is fairly simple, and that there is enough maths papers available for the AI to roughly learn it.
The Putnam archive consists of 12 questions X 20 years=240 questions, spread over many fields of maths. This is not big data. You can’t train a neural net to do much with just 240 examples. If aliens gave us a billion similar questions (with answers), I don’t doubt we could make an AI that scores 100% on putnam. Still it is plausible that enough maths could be scraped together to roughly learn the relation from ZFC to human maths. And such an AI could be fine tuned on some dataset similar to Putnam, and then do well in putnam. (Especially if the examiner is forgiving of strange formulaic phrasings)
The Putnam problems are a unwooly. I suspect such an AI couldn’t take in the web page you linked, and produce a publishable paper solving the yang mills mass gap. Given a physicist who understood the question, and was also prepared to dive into ZFC (or lean or some other formal system) formulae, then I suspect such an AI could be useful. If the physicist doesn’t look at the ZFC, but is doing a fair bit of hand holding, they probably succeed. I am assuming the AI is just magic at ZFC, that’s self play. The thing I think is hard to learn is the link from the woolly gesturing to the ZFC. So with a physicist there to be more unambiguous about the question, and to cherrypick and paste together the answers, and generally polish a mishmash of theorems into a more flowing narrative, that would work. I’m not sure how much hand holding would be needed. I’m not sure you get your Putnam bot to work in the first place.
Sure, but the post sets up a hypothetical, so prompts its development, not denial, no matter how implausible.
I think scaling up generation of data that’s actually useful for more than robustness in language/multimodal models is the only remaining milestone before AGIs. Learn on your effortful multistep thoughts about naturally sourced data, not just on the data itself. Alignment of this generated data is what makes or breaks the future. The current experiments are much easier, because the naturally sourced data is about as aligned as it gets, you just need to use it correctly, while generated data could systematically shift the targets of generalization.