I’ve done some experiments along those lines previously for non-o1 models and found the same. I’m mildly surprised o1 cannot handle it, but not enormously.
I increasingly suspect “humans are general because of the data, not the algorithm” is true and will remain true for LLMs. You can have amazingly high performance on domain X, but very low performance on “easy” domain Y, and this just keeps being true to arbitrary levels of “intelligence”; Karpathy’s “jagged intelligence” is true of humans and keeps being true all the way up.
So the story goes like this: there are two ways people think of “general intelligence.” Fuzzy frame upcoming that I do not fully endorse.
General Intelligence = (general learning algorithm) + (data)
General Intelligence = (learning algorithm) + (general data)
It’s hard to describe all the differences here, so I’m just going to enumerate some ways people approach the world differently, depending on the frame.
Seminal text for the first The Power of Intelligence, which attributes general problem solving entirely to the brain. Seminal text for the second is The Secret of Our Success, which points out that without the load of domain-specific culture, human problem solving is shit.
When the first think of the moon landing, they think “Man, look at that out-of-domain problem solving, that lets a man who evolved in Africa walk on the moon.” When the second think of the moon landing, they think of how humans problem solving is so situated that we needed to not just hire the Nazis who had experience with rockets but put them in charge.
The first thinks of geniuses as those with a particularly high dose of General Intelligence, which is why they solved multiple problems in multiple domains (like Einstein, and Newton did). The second thinks of geniuses as slightly smarter-than average people who probably crested a wave of things that many of their peers might have figured out… and who did so because they were more stubborn, such that eventually they would endorse dumb ideas with as much fervor as they did their good ones (like Einstein, and Newton did).
First likes to make analogies of… intelligence to entire civilizations. Second thinks that’s cool, but look—civilization does lots of things brains empirically don’t, so maybe civilization is the problem-solving unit generally? Like the humans who walked on the moon did not, in fact, get their training data from the savannah, and that seems pretty relevant.
First… expects LLMs to not make it, because they are bad at out-of-domain thinking, maybe. Second is like, sure, LLMs are bad at out-of-domain thinking. So are humans, so what? Spiky intelligence and so on. Science advances not in one mind, but with the funeral of each mind. LLMs lose plasticity as they train. Etc.
I’ve actually come to a remarkably similar conclusion as described in this post. We’re phrasing things differently (I called it the “myth of general intelligence”), but I think we’re getting at the same thing. The Secret of Our Success has been very influential on my thinking as well.
This is also my biggest point of contention with Yudkowsky’s views. He seems to suggest (for example, in this post) that capabilities are gained from being able to think well and a lot. In my opinion he vastly underestimates the amount of data/experience required to make that possible in the first place, for any particular capability or domain. This speaks to the age-old (classical) rationalism vs empiricism debate, where Yudkowsky seems to sit on the rationalist side, whereas it seems you and I would lean more to the empiricist side.
I think the Secrets of our Success goes too far, and I’m less willing to rely on it than you, but I do think it got at least a significant share of how humans learn right (like 30-50% at minimum).
It might just be a perception problem. LLMs don’t really seem to have a good understanding of a letter being next to another one yet or what a diagonal is. If you look at arc-agi with o3, you see it doing worse as the grid gets larger with humans not having the same drawback.
EDIT: Tried on o1 pro right now. Doesn’t seem like a perception problem, but it still could be. I wonder if it’s related to being a succcesful agent. It might not model a sequence of actions on the state of a world properly yet. It’s strange that this isn’t unlocked with reasoning.
I’ve done some experiments along those lines previously for non-o1 models and found the same. I’m mildly surprised o1 cannot handle it, but not enormously.
I increasingly suspect “humans are general because of the data, not the algorithm” is true and will remain true for LLMs. You can have amazingly high performance on domain X, but very low performance on “easy” domain Y, and this just keeps being true to arbitrary levels of “intelligence”; Karpathy’s “jagged intelligence” is true of humans and keeps being true all the way up.
Interesting statement. Could you expand a bit on what you mean by this?
So the story goes like this: there are two ways people think of “general intelligence.” Fuzzy frame upcoming that I do not fully endorse.
General Intelligence = (general learning algorithm) + (data)
General Intelligence = (learning algorithm) + (general data)
It’s hard to describe all the differences here, so I’m just going to enumerate some ways people approach the world differently, depending on the frame.
Seminal text for the first The Power of Intelligence, which attributes general problem solving entirely to the brain. Seminal text for the second is The Secret of Our Success, which points out that without the load of domain-specific culture, human problem solving is shit.
When the first think of the moon landing, they think “Man, look at that out-of-domain problem solving, that lets a man who evolved in Africa walk on the moon.” When the second think of the moon landing, they think of how humans problem solving is so situated that we needed to not just hire the Nazis who had experience with rockets but put them in charge.
The first thinks of geniuses as those with a particularly high dose of General Intelligence, which is why they solved multiple problems in multiple domains (like Einstein, and Newton did). The second thinks of geniuses as slightly smarter-than average people who probably crested a wave of things that many of their peers might have figured out… and who did so because they were more stubborn, such that eventually they would endorse dumb ideas with as much fervor as they did their good ones (like Einstein, and Newton did).
First likes to make analogies of… intelligence to entire civilizations. Second thinks that’s cool, but look—civilization does lots of things brains empirically don’t, so maybe civilization is the problem-solving unit generally? Like the humans who walked on the moon did not, in fact, get their training data from the savannah, and that seems pretty relevant.
First… expects LLMs to not make it, because they are bad at out-of-domain thinking, maybe. Second is like, sure, LLMs are bad at out-of-domain thinking. So are humans, so what? Spiky intelligence and so on. Science advances not in one mind, but with the funeral of each mind. LLMs lose plasticity as they train. Etc.
Thank you for the reply!
I’ve actually come to a remarkably similar conclusion as described in this post. We’re phrasing things differently (I called it the “myth of general intelligence”), but I think we’re getting at the same thing. The Secret of Our Success has been very influential on my thinking as well.
This is also my biggest point of contention with Yudkowsky’s views. He seems to suggest (for example, in this post) that capabilities are gained from being able to think well and a lot. In my opinion he vastly underestimates the amount of data/experience required to make that possible in the first place, for any particular capability or domain. This speaks to the age-old (classical) rationalism vs empiricism debate, where Yudkowsky seems to sit on the rationalist side, whereas it seems you and I would lean more to the empiricist side.
I think the Secrets of our Success goes too far, and I’m less willing to rely on it than you, but I do think it got at least a significant share of how humans learn right (like 30-50% at minimum).
It might just be a perception problem. LLMs don’t really seem to have a good understanding of a letter being next to another one yet or what a diagonal is. If you look at arc-agi with o3, you see it doing worse as the grid gets larger with humans not having the same drawback.
EDIT: Tried on o1 pro right now. Doesn’t seem like a perception problem, but it still could be. I wonder if it’s related to being a succcesful agent. It might not model a sequence of actions on the state of a world properly yet. It’s strange that this isn’t unlocked with reasoning.