I am a Computer Science PhD who has worked in Machine Learning at both Amazon and Google Brain.
I have a blog at https://onemanynone.substack.com/ where I publish posts aimed at a broader and less technical audience.
OneManyNone
Yeah, I agree that it’s a surprising fact requiring a bit of updating on my end. But I think the compression point probably matters more than you would think, and I’m finding myself more convinced the more I think about it. A lot of processing goes into turning that 1GB into a brain, and that processing may not be highly reducible. That’s sort of what I was getting at, and I’m not totally sure the complexity of that process wouldn’t add up to a lot more than 1GB.
It’s tempting to think of DNA as sufficiently encoding a human, but (speculatively) it may make more sense to think of DNA only as the input to a very large function which outputs a human. It seems strange, but it’s not like anyone’s ever built a human (or any other organism) in a lab from DNA alone; it’s definitely possible that there’s a huge amount of information stored in the processes of a living human which isn’t sufficiently encoded just by DNA.
You don’t even have to zoom out to things like organs or the brain. Just knowing which bases match to which amino acids is an (admittedly simple) example of processing that exists outside of the DNA encoding itself.
To your point about the particle filter, my whole point is that you can’t just assume the super intelligence can generate an infinite number of particles, because that takes infinite processing. At the end of the day, superintelligence isn’t magic—those hypotheses have to come from somewhere. They have to be built, and they have to be built sequentially. The only way you get to skip steps is by reusing knowledge that came from somewhere else.
Take a look at the game of Go. The computational limits on the number of games that could be simulated made this “try everything” approach essentially impossible. When Go was finally “solved”, it was with an ML algorithm that proposed only a limited number of possible sequences—it was just that the sequences it proposed were better.
But how did it get those better moves? It didn’t pull them out of the air, it used abstractions it had accumulated form playing a huge number of games._____
I do agree with some of the things you’re saying about architecture, though. Sometimes inductive bias imposes limitations. In terms of hypotheses, it can and does often put hard limits on which hypotheses you can consider, period.
I also admit I was wrong and was careless in saying that inductive bias is just information you started with. But I don’t think it’s imprecise to say that “information you started with” is just another form of inductive bias, of which ”architecture” is another.
But at a certain point, the line between architecture and information is going to blur. As I’ve pointed out, a transformer without some of the explicit benefits of a CNN’s architecture can still structure itself in a way that learns shift invariance. I also don’t think any of this effects my key arguments.
Yes, I wasn’t sure if it was wise to use TSP as an example for that reason. Originally I wrote it using the Hamiltonian Path problem, but thought a non-technical reader would be more able to quickly understand TSP. Maybe that was a mistake. It also seems I may have underestimated how technical my audience would be.
But your point about heuristics is right. That’s basically what I think an AGI based on LLMs would do to figure out the world. However, I doubt there would be one heuristic which could do Solomonoff induction in all scenarios, or even most. Which means you’d have to select the right one, which means you’d need a selection criteria, which takes us back to my original points.
You’re right that my points lack a certain rigor. I don’t think there is a rigorous answer to questions like “what does slow mean?”.
However, there is a recurring theme I’ve seen in discussions about AI where people express incredulity about neural networks as a method for AGI since they require so much “more data” than humans to train. My argument was merely that we should expect things to take a lot of data, and situations where they don’t are illusory. Maybe that’s less common in this space, so it I should have framed it differently. But I wrote this mostly to put it out there and get people’s thoughts.
Also, I see your point about DNA only accounting for 1GB. I wasn’t aware it was so low. I think it’s interesting and suggests the possibility of smaller learning systems than I envisioned, but that’s as much a question about compression as anything else. Don’t forget that that DNA still needs to be “uncompressed” into a human, and at least some of that process is using information stored in the previous generation of human. Admittedly, it’s not clear how much that last part accounts for, but there is evidence that part of a baby’s development is determined by the biological state of the mother.
But I guess I would say my argument does rely on us not getting that <1GB of stuff, with the caveat that that 1GB is super highly compressed through a process that takes a very complex system to uncompress.
I should add as well that I definitely don’t believe that LLMs are remotely efficient, and I wouldn’t necessarily be surprised if humans are as close to the maximum on data efficiency as possible. I wouldn’t be surprised if they weren’t, either. But we were built over millions (billions?) of years under conditions that put a very high price tag on inefficiency, so it seems reasonable to believe our data efficiency is at least at some type of local minima.
EDIT: Another way to phrase the point about DNA: You need to account not just for the storage size of the DNA, but also the Kolmogrov complexity of turning that into a human. No idea if that adds a lot to its size, though.
In the context of his argument I think the claim is reasonable, since I interpreted it as the claim that, since it can be used a tool that designs plans, it has already overcome the biggest challenge of being an agent.
But if we take that claim out of context and interpret it literally, then I agree that it’s not a justified statement per se. It may be able to simulate a plausible causal explanation, but I think that is very different from actually knowing it. As long as you only have access to partial information, there are theoretical limits to what you can know about the world. But it’s hard to think of contexts where that gap would matter a lot.
I think there’s definitely some truth to this sometimes, but I don’t think you’ve correctly described the main driver of genius. I actually think it’s the opposite: my guess is that there’s a limit to thinking speed, and genius exists precisely because some people just have better thoughts. Even Von Neumann himself attributed much of his abilities to intuition. He would go to sleep and in the morning he would have the answer to whatever problem he was toiling over.
I think, instead, that ideas for the most part emerge through some deep and incomprehensible heuristics in our brains. Think about a chess master recognizing the next move at just a glance. However much training it took to give him that ability, he is not doing a tree search at that moment. It’s not hard to imagine a hypothetical where his brain, with no training, came pre-configured to make the same decisions, and indeed I think that’s more or less what happens with Chess prodigies. They don’t come preconfigured, but their brains are better primed to develop those intuitions.
In other words, I think that genius is making better connections with the same number of “cycles”, and I think there’s evidence that LLMs do this too as they advance. For instance, part of the significance of DeepMind’s Chinchilla paper was that by training longer they were able to get better performance in a smaller network. The only explanation for this is that the quality of the processing had improved enough to counteract the effects of the lost quantity.
I guess so? I’m not sure what point you’re making, so it’s hard for me to address it.
My point is that if you want to build something intelligent, you have to do a lot of processing and there’s no way around it. Playing several million games of Go counts as a lot of processing.