OK, so in your picture chimps had less training / less scale / worse arch than humans, and this is related to the fact that humans have language and chimps don’t. “Scale alone leads to new capabilities.”
But if we explore the regime of “even more training than humans / even more scale than humans / even better arch than humans”, your claim is that this whole regime is just a giant dead zone where nothing interesting happens, and thus you’re just being inefficient—really you should have split it into multiple smaller models. Correct? If so, why do you think that?
In other words, if scaling up from chimp brains to human brains unlocked new capabilities (namely language), why shouldn’t scaling up from human brains to superhuman brains unlock new capabilities too? Do you think there are no capabilities left, or something?
(Sorry if you’ve already talked about this elsewhere.)
OK, so in your picture chimps had less training / less scale / worse arch than humans, and this is related to the fact that humans have language and chimps don’t. “Scale alone leads to new capabilities.”
Scale in compute and data—as according to NN scaling laws. The language/culture/tech leading to new effective data scaling regime quickly reconfigured the pareto surface payoff for brain size, so its more of a feedback loop rather than a clear cause effect (which is why I would consider it a foom in terms of evolutionary timescales).
In other words, if scaling up from chimp brains to human brains unlocked new capabilities (namely language), why shouldn’t scaling up from human brains to superhuman brains unlock new capabilities too?
Of course, but the new capabilities are more like new skills, mental programs, and wisdom not metasystems transitions (changes to core scaling regime).
A metasystems transition would be something as profound, rare, and as important as transitioning from effective lifetime training data being a constant to effective lifetime data scaling with population size, or transitioning from non-programmable to programmable.
Zoom in and look at what a large NN is for—what does it do? It can soak up more data to acquire more knowledge/skills, and it also learns faster per timestep (as it’s searching in parallel over a wider circuit space per time step), but the latter is already captured in net training compute anyway. So intelligence is mostly about the volume of search space explored, which scales with net training compute—this is almost an obvious direct consequence of Solomon induction or derivation thereof.
I am not arguing that there are no more metasystems transitions, only that “make brains bigger” doesn’t automatically enable them. The single largest impact of digital minds is probably just speed. Not energy efficiency or software efficiency, just raw speed.
OK, so in your picture chimps had less training / less scale / worse arch than humans, and this is related to the fact that humans have language and chimps don’t. “Scale alone leads to new capabilities.”
But if we explore the regime of “even more training than humans / even more scale than humans / even better arch than humans”, your claim is that this whole regime is just a giant dead zone where nothing interesting happens, and thus you’re just being inefficient—really you should have split it into multiple smaller models. Correct? If so, why do you think that?
In other words, if scaling up from chimp brains to human brains unlocked new capabilities (namely language), why shouldn’t scaling up from human brains to superhuman brains unlock new capabilities too? Do you think there are no capabilities left, or something?
(Sorry if you’ve already talked about this elsewhere.)
Scale in compute and data—as according to NN scaling laws. The language/culture/tech leading to new effective data scaling regime quickly reconfigured the pareto surface payoff for brain size, so its more of a feedback loop rather than a clear cause effect (which is why I would consider it a foom in terms of evolutionary timescales).
Of course, but the new capabilities are more like new skills, mental programs, and wisdom not metasystems transitions (changes to core scaling regime).
A metasystems transition would be something as profound, rare, and as important as transitioning from effective lifetime training data being a constant to effective lifetime data scaling with population size, or transitioning from non-programmable to programmable.
Zoom in and look at what a large NN is for—what does it do? It can soak up more data to acquire more knowledge/skills, and it also learns faster per timestep (as it’s searching in parallel over a wider circuit space per time step), but the latter is already captured in net training compute anyway. So intelligence is mostly about the volume of search space explored, which scales with net training compute—this is almost an obvious direct consequence of Solomon induction or derivation thereof.
I am not arguing that there are no more metasystems transitions, only that “make brains bigger” doesn’t automatically enable them. The single largest impact of digital minds is probably just speed. Not energy efficiency or software efficiency, just raw speed.