But on your model, what is the universal learning machine learning, at runtime? ..
On my model, one of the things it is learning is cognitive algorithms. And different classes of training setups + scale + training data result in it learning different cognitive algorithms; algorithms that can implement qualitatively different functionality.
Yes.
And my claim is that some setups let the learning system learn a (holistic) general-intelligence algorithm.
I consider a ULM to already encompass general/universal intelligence in the sense that a properly scaled ULM can learn anything, could become a superintelligence with vast scaling, etc.
You seem to consider the very idea of “algorithms” or “architectures” mattering silly. But what happens when a human groks how to do basic addition, then? They go around memorizing what sum each set of numbers maps to, and we’re more powerful than animals because we can memorize more numbers?
I think I used specifically that example earlier in a related thread: The most common algorithm most humans are taught and learn is memorization of a small lookup table for single digit addition (and multiplication), combined with memorization of a short serial mental program for arbitrary digit addition. Some humans learn more advanced ‘tricks’ or short cuts, and more rarely perhaps even more complex, lower latency parallel addition circuits.
Core to the ULM view is the scaling hypothesis: once you have a universal learning architecture, novel capabilities emerge automatically with scale. Universal learning algorithms (as approximations of bayesian inference) are more powerful/scalable than genetic evolution, and if you think through what (greatly sped up) evolution running inside a brain during its lifetime would actually entail it becomes clear it could evolve any specific capabilities within hardware constraints, given sufficient training compute/time and an appropriate environment (training data).
There is nothing more general/universal than that, just as there is nothing more general/universal than turing machines.
Is there any taxon X for which you’d agree that “evolution had to hit upon the X brain architecture before raw scaling would’ve let it produce a generally intelligent species”?
Not really—evolution converged on a similar universal architecture in many different lineages. In vertebrates we have a few species of cetaceans, primates and pachyderms which all scaled up to large brain sizes, and some avian species also scaled up to primate level synaptic capacity (and associated tool/problem solving capabilities) with different but similar/equivalent convergent architecture. Language simply developed first in the primate homo genus, probably due to a confluence of factors. But its clear that brain scale—especially specifically the synaptic capacity of ‘upper’ brain regions—is the single most important predictive factor in terms of which brain lineage evolves language/culture first.
But even some invertebrates (octupi) are quite intelligent—and in each case there is convergence to similar algorithmic architecture, but achieved through different mechanisms (and predecessor structures).
It is not the case that the architecture of general intelligence is very complex and hard to evolve. It’s probably not more complex than the heart, or high quality eyes, etc. Instead it’s just that for a general purpose robot to invent recursive turing complete language from primitive communication—that development feat first appeared only at foundation model training scale ~10^25 flops equivalent. Obviously that is not the minimum compute for a ULM to accomplish that feat—but all animal brains are first and foremost robots, and thriving at real world robotics is incredibly challenging (general robotics is more challenging than language or early AGI, as all self-driving car companies are now finally learning). So language had to bootstrap from some random small excess plasticity budget, not the full training budget of the brain.
The greatest validation of the scaling hypothesis (and thus my 2015 ULM post) is the fact that AI systems began to match human performance once scaled up to similar levels of net training compute. GPT4 is at least as capable as human linguistic cortex in isolation; and matches a significant chunk of the capabilities of an intelligent human. It has far more semantic knowledge, but is weak in planning, creativity, and of course motor control/robotics. But none of that is surprising as it’s still missing a few main components that all intelligent brains contain (for agentic planning/search). But this is mostly a downstream compute limitation of current GPUs and algos vs neuromorphic/brains, and likely to be solved soon.
Thanks for detailed answers, that’s been quite illuminating! I still disagree, but I see the alternate perspective much clearer now, and what would look like notable evidence for/against it.
Yes.
I consider a ULM to already encompass general/universal intelligence in the sense that a properly scaled ULM can learn anything, could become a superintelligence with vast scaling, etc.
I think I used specifically that example earlier in a related thread: The most common algorithm most humans are taught and learn is memorization of a small lookup table for single digit addition (and multiplication), combined with memorization of a short serial mental program for arbitrary digit addition. Some humans learn more advanced ‘tricks’ or short cuts, and more rarely perhaps even more complex, lower latency parallel addition circuits.
Core to the ULM view is the scaling hypothesis: once you have a universal learning architecture, novel capabilities emerge automatically with scale. Universal learning algorithms (as approximations of bayesian inference) are more powerful/scalable than genetic evolution, and if you think through what (greatly sped up) evolution running inside a brain during its lifetime would actually entail it becomes clear it could evolve any specific capabilities within hardware constraints, given sufficient training compute/time and an appropriate environment (training data).
There is nothing more general/universal than that, just as there is nothing more general/universal than turing machines.
Not really—evolution converged on a similar universal architecture in many different lineages. In vertebrates we have a few species of cetaceans, primates and pachyderms which all scaled up to large brain sizes, and some avian species also scaled up to primate level synaptic capacity (and associated tool/problem solving capabilities) with different but similar/equivalent convergent architecture. Language simply developed first in the primate homo genus, probably due to a confluence of factors. But its clear that brain scale—especially specifically the synaptic capacity of ‘upper’ brain regions—is the single most important predictive factor in terms of which brain lineage evolves language/culture first.
But even some invertebrates (octupi) are quite intelligent—and in each case there is convergence to similar algorithmic architecture, but achieved through different mechanisms (and predecessor structures).
It is not the case that the architecture of general intelligence is very complex and hard to evolve. It’s probably not more complex than the heart, or high quality eyes, etc. Instead it’s just that for a general purpose robot to invent recursive turing complete language from primitive communication—that development feat first appeared only at foundation model training scale ~10^25 flops equivalent. Obviously that is not the minimum compute for a ULM to accomplish that feat—but all animal brains are first and foremost robots, and thriving at real world robotics is incredibly challenging (general robotics is more challenging than language or early AGI, as all self-driving car companies are now finally learning). So language had to bootstrap from some random small excess plasticity budget, not the full training budget of the brain.
The greatest validation of the scaling hypothesis (and thus my 2015 ULM post) is the fact that AI systems began to match human performance once scaled up to similar levels of net training compute. GPT4 is at least as capable as human linguistic cortex in isolation; and matches a significant chunk of the capabilities of an intelligent human. It has far more semantic knowledge, but is weak in planning, creativity, and of course motor control/robotics. But none of that is surprising as it’s still missing a few main components that all intelligent brains contain (for agentic planning/search). But this is mostly a downstream compute limitation of current GPUs and algos vs neuromorphic/brains, and likely to be solved soon.
Thanks for detailed answers, that’s been quite illuminating! I still disagree, but I see the alternate perspective much clearer now, and what would look like notable evidence for/against it.