My argument for the sharp discontinuity routes through the binary nature of general intelligence + an agency overhang, both of which could be hypothesized via non-evolution-based routes. Considerations about brain efficiency or Moore’s law don’t enter into it.
You claim later to agree with ULM (learning from scratch) over evolved-modularity, but the paragraph above and statements like this in your link:
The homo sapiens sapiens spent thousands of years hunter-gathering before starting up civilization, even after achieving modern brain size.
It would still be generally capable in the limit, but it wouldn’t be instantly omnicide-capable.
So when the GI component first coalesces,
Suggest to me that you have only partly propagated the implications of ULM and the scaling hypothesis. There is no hard secret to AGI—the architecture of systems capable of scaling up to AGI is not especially complex to figure out, and has in fact been mostly known for decades (schmidhuber et al figured most of it out long before the DL revolution). This is all strongly implied by ULM/scaling, because the central premise of ULM is that GI is the result of massively scaling up simple algorithms and architectures. Intelligence is emergent from scaling simple algorithms, like complexity emerges from scaling of specific simple cellular automata rules (ie life).
All mammal brains share the same core architecture—not only is there nothing special about the human brain architecture, there is not much special about the primate brain other than hyperpameters better suited to scaling up to our size ( a better scaling program). I predicted the shape of transformers (before the first transformers paper) and their future success with scaling in 2015, but also see the Bitter Lesson from 2019.
That post from EY starts with a blatant lie—if you actually have read Mind Children, Moravec predicted AGI around 2028, not 2010.
So evolution did need to hit upon, say, the primate architecture, in order to get to general intelligence.
Not really—many other animal species are generally intelligent as demonstrated by general problem solving ability and proto-culture (elephants seem to have burial rituals, for example), they just lack full language/culture (which is the sharp threshold transition). Also at least one species of cetacean may have language or at least proto-language (jury’s still out on that), but no technology due to lack of suitable manipulators, environmental richness etc.
Its very clear that if you look at how the brain works in detail that the core architectural components of the human brain are all present in a mouse brain, just much smaller scale. The brain also just tiles simple universal architectural components to solve any problem (from vision to advanced mathematics), and those components are very similar to modern ANN components due to a combination of intentional reverse engineering and parallel evolution/convergence.
There are a few specific weaknesses of current transformer arch systems (lack of true recurrence), inference efficiency, etc but the solutions are all already in the pipes so to speak and are mostly efficiency multipliers rather than scaling discontinuities.
But that only means the sharp left turn caused by the architectural-advance part – the part we didn’t yet hit upon, the part that’s beyond LLMs,
So this again is EMH, not ULM—there is absolutely no architectural advance in the human brain over our primate ancestors worth mentioning, other than scale. I understand the brain deeply enough to support this statement with extensive citations (and have, in prior articles I’ve already linked).
Taboo ‘sharp left turn’ - it’s an EMH term. The ULM equivalent is “Cultural Criticality” or “Culture Meta-systems Transition”. Human intelligence is the result of culture—an abrupt transition from training datasets & knowledge of size O(1) human lifetime to ~O(N*T). It has nothing to do with any architectural advance. If you take a human brain and raise it by animals you just get a smart animal. The brain arch is already fully capable of advanced metalearning, but it won’t bootstrap to human STEM capability without an advanced education curriculum (the cultural transmission). Through culture we absorb the accumulated knowledge /wisdom of all of our ancestors, and this is a sharp transition. But it’s also a one time event! AGI won’t repeat that.
It’s a metasystems transition similar to the unicellular->multicellular transition.
not only is there nothing special about the human brain architecture, there is not much special about the primate brain other than hyperpameters better suited to scaling up to our size
I don’t think this is entirely true. Injecting human glial cells into mice made them smarter. certainly that doesn’t provide evidence for any sort of exponential difference, and you could argue it’s still just hyperparams, but it’s hyperparams that work better small too. I think we should be expecting sub linear growth in quality of the simple algorithms but should also be expecting that growth to continue for a while. It seems very silly that you of all people insist otherwise, given your interests.
We found that the glial chimeric mice exhibited both increased synaptic plasticity and improved cognitive performance, manifested by both enhanced long-term potentiation and improved performance in a variety of learning tasks (Han et al., 2013). In the context of that study, we were surprised to note that the forebrains of these animals were often composed primarily of human glia and their progenitors, with overt diminution in the relative proportion of resident mouse glial cells.
The paper which more directly supports the “made them smarter” claim seems to be this. I did somewhat anticipate this—“not much special about the primate brain other than ..”, but was not previously aware of this particular line of research and certainly would not have predicted their claimed outcome as the most likely vs various obvious alternatives. Upvoted for the interesting link.
Specifically I would not have predicted that the graft of human glial cells would have simultaneously both 1.) outcompeted the native mouse glial cells, and 2.) resulted in higher performance on a handful of interesting cognitive tests.
I’m still a bit skeptical of the “made them smarter” claim as it’s always best to taboo ‘smarter’ and they naturally could have cherrypicked the tests (even unintentionally), but it does look like the central claim—that injection of human GPCs (glial progenitor cells) into fetal mice does result in mice brains that learn at least some important tasks more quickly, and this is probably caused by facilitation of higher learning rates. However it seems to come at a cost of higher energy expenditure, so it’s not clear yet that this is a pure pareto improvement—could be a tradeoff worthwhile in larger sparser human brains but not in the mouse brain such that it wouldn’t translate into fitness advantage.
Or perhaps it is a straight up pareto improvement—that is not unheard of, viral horizontal gene transfer is a thing, etc.
We still seem to have some disconnect on the basic terminology. The brain is a universal learning machine, okay. The learning algorithms that govern it and its architecture are simple, okay, and the genome specifies only them. On our end, we can similarly implement the AGI-complete learning algorithms and architectures with relative ease, and they’d be pretty simple. Sure. I was holding the same views from the beginning.
But on your model, what is the universal learning machine learning, at runtime? Look-up tables?
On my model, one of the things it is learning is cognitive algorithms. And different classes of training setups + scale + training data result in it learning different cognitive algorithms; algorithms that can implement qualitatively different functionality. Scale is part of it: larger-scale brains have the room to learn different, more sophisticated algorithms.
And my claim is that some setups let the learning system learn a (holistic) general-intelligence algorithm.
You seem to consider the very idea of “algorithms” or “architectures” mattering silly. But what happens when a human groks how to do basic addition, then? They go around memorizing what sum each set of numbers maps to, and we’re more powerful than animals because we can memorize more numbers?
Its very clear that if you look at how the brain works in detail that the core architectural components of the human brain are all present in a mouse brain, just much smaller scale
Shrug, okay, so let’s say evolution had to hit upon the Mammalia brain architecture. Would you agree with that?
Or we can expand further. Is there any taxon X for which you’d agree that “evolution had to hit upon the X brain architecture before raw scaling would’ve let it produce a generally intelligent species”?
But on your model, what is the universal learning machine learning, at runtime? ..
On my model, one of the things it is learning is cognitive algorithms. And different classes of training setups + scale + training data result in it learning different cognitive algorithms; algorithms that can implement qualitatively different functionality.
Yes.
And my claim is that some setups let the learning system learn a (holistic) general-intelligence algorithm.
I consider a ULM to already encompass general/universal intelligence in the sense that a properly scaled ULM can learn anything, could become a superintelligence with vast scaling, etc.
You seem to consider the very idea of “algorithms” or “architectures” mattering silly. But what happens when a human groks how to do basic addition, then? They go around memorizing what sum each set of numbers maps to, and we’re more powerful than animals because we can memorize more numbers?
I think I used specifically that example earlier in a related thread: The most common algorithm most humans are taught and learn is memorization of a small lookup table for single digit addition (and multiplication), combined with memorization of a short serial mental program for arbitrary digit addition. Some humans learn more advanced ‘tricks’ or short cuts, and more rarely perhaps even more complex, lower latency parallel addition circuits.
Core to the ULM view is the scaling hypothesis: once you have a universal learning architecture, novel capabilities emerge automatically with scale. Universal learning algorithms (as approximations of bayesian inference) are more powerful/scalable than genetic evolution, and if you think through what (greatly sped up) evolution running inside a brain during its lifetime would actually entail it becomes clear it could evolve any specific capabilities within hardware constraints, given sufficient training compute/time and an appropriate environment (training data).
There is nothing more general/universal than that, just as there is nothing more general/universal than turing machines.
Is there any taxon X for which you’d agree that “evolution had to hit upon the X brain architecture before raw scaling would’ve let it produce a generally intelligent species”?
Not really—evolution converged on a similar universal architecture in many different lineages. In vertebrates we have a few species of cetaceans, primates and pachyderms which all scaled up to large brain sizes, and some avian species also scaled up to primate level synaptic capacity (and associated tool/problem solving capabilities) with different but similar/equivalent convergent architecture. Language simply developed first in the primate homo genus, probably due to a confluence of factors. But its clear that brain scale—especially specifically the synaptic capacity of ‘upper’ brain regions—is the single most important predictive factor in terms of which brain lineage evolves language/culture first.
But even some invertebrates (octupi) are quite intelligent—and in each case there is convergence to similar algorithmic architecture, but achieved through different mechanisms (and predecessor structures).
It is not the case that the architecture of general intelligence is very complex and hard to evolve. It’s probably not more complex than the heart, or high quality eyes, etc. Instead it’s just that for a general purpose robot to invent recursive turing complete language from primitive communication—that development feat first appeared only at foundation model training scale ~10^25 flops equivalent. Obviously that is not the minimum compute for a ULM to accomplish that feat—but all animal brains are first and foremost robots, and thriving at real world robotics is incredibly challenging (general robotics is more challenging than language or early AGI, as all self-driving car companies are now finally learning). So language had to bootstrap from some random small excess plasticity budget, not the full training budget of the brain.
The greatest validation of the scaling hypothesis (and thus my 2015 ULM post) is the fact that AI systems began to match human performance once scaled up to similar levels of net training compute. GPT4 is at least as capable as human linguistic cortex in isolation; and matches a significant chunk of the capabilities of an intelligent human. It has far more semantic knowledge, but is weak in planning, creativity, and of course motor control/robotics. But none of that is surprising as it’s still missing a few main components that all intelligent brains contain (for agentic planning/search). But this is mostly a downstream compute limitation of current GPUs and algos vs neuromorphic/brains, and likely to be solved soon.
Thanks for detailed answers, that’s been quite illuminating! I still disagree, but I see the alternate perspective much clearer now, and what would look like notable evidence for/against it.
- there is absolutely no architectural advance in the human brain over our primate ancestors worth mentioning, other than scale”
However how do you know that a massive advance isn’t still possible, especially as our NN can use stuff such as backprop, potentially quantum algorithm to train weights and other potential advances, that simply aren’t possible for nature to use? Say we figure out the brain learning algorithm, get AGI then quickly get something that uses the best of both nature and tech stuff not assessable to nature.
Of course a massive advance is possible, but mostly just in terms of raw speed. The brain seems reasonably close to pareto efficiency in intelligence per watt for irreversible computers, but in the next decade or so I expect we’ll close that gap as we move into more ‘neuromorphic’ or PIM computing (computation closer to memory). If we used the ~1e16w solar energy potential of just the Saraha desert that would support a population of trillions of brain-scale AIs or uploads running 1000x real-time.
especially as our NN can use stuff such as backprop,
The brain appears to already using algorithms similar to—but more efficient/effective—than standard backprop.
potentially quantum algorithm to train weights
This is probably mostly a nothingburger for various reasons, but reversible computing could eventually provide some further improvement, especially in a better location like buried in the lunar cold spot.
You claim later to agree with ULM (learning from scratch) over evolved-modularity, but the paragraph above and statements like this in your link:
Suggest to me that you have only partly propagated the implications of ULM and the scaling hypothesis. There is no hard secret to AGI—the architecture of systems capable of scaling up to AGI is not especially complex to figure out, and has in fact been mostly known for decades (schmidhuber et al figured most of it out long before the DL revolution). This is all strongly implied by ULM/scaling, because the central premise of ULM is that GI is the result of massively scaling up simple algorithms and architectures. Intelligence is emergent from scaling simple algorithms, like complexity emerges from scaling of specific simple cellular automata rules (ie life).
All mammal brains share the same core architecture—not only is there nothing special about the human brain architecture, there is not much special about the primate brain other than hyperpameters better suited to scaling up to our size ( a better scaling program). I predicted the shape of transformers (before the first transformers paper) and their future success with scaling in 2015, but also see the Bitter Lesson from 2019.
That post from EY starts with a blatant lie—if you actually have read Mind Children, Moravec predicted AGI around 2028, not 2010.
Not really—many other animal species are generally intelligent as demonstrated by general problem solving ability and proto-culture (elephants seem to have burial rituals, for example), they just lack full language/culture (which is the sharp threshold transition). Also at least one species of cetacean may have language or at least proto-language (jury’s still out on that), but no technology due to lack of suitable manipulators, environmental richness etc.
Its very clear that if you look at how the brain works in detail that the core architectural components of the human brain are all present in a mouse brain, just much smaller scale. The brain also just tiles simple universal architectural components to solve any problem (from vision to advanced mathematics), and those components are very similar to modern ANN components due to a combination of intentional reverse engineering and parallel evolution/convergence.
There are a few specific weaknesses of current transformer arch systems (lack of true recurrence), inference efficiency, etc but the solutions are all already in the pipes so to speak and are mostly efficiency multipliers rather than scaling discontinuities.
So this again is EMH, not ULM—there is absolutely no architectural advance in the human brain over our primate ancestors worth mentioning, other than scale. I understand the brain deeply enough to support this statement with extensive citations (and have, in prior articles I’ve already linked).
Taboo ‘sharp left turn’ - it’s an EMH term. The ULM equivalent is “Cultural Criticality” or “Culture Meta-systems Transition”. Human intelligence is the result of culture—an abrupt transition from training datasets & knowledge of size O(1) human lifetime to ~O(N*T). It has nothing to do with any architectural advance. If you take a human brain and raise it by animals you just get a smart animal. The brain arch is already fully capable of advanced metalearning, but it won’t bootstrap to human STEM capability without an advanced education curriculum (the cultural transmission). Through culture we absorb the accumulated knowledge /wisdom of all of our ancestors, and this is a sharp transition. But it’s also a one time event! AGI won’t repeat that.
It’s a metasystems transition similar to the unicellular->multicellular transition.
I don’t think this is entirely true. Injecting human glial cells into mice made them smarter. certainly that doesn’t provide evidence for any sort of exponential difference, and you could argue it’s still just hyperparams, but it’s hyperparams that work better small too. I think we should be expecting sub linear growth in quality of the simple algorithms but should also be expecting that growth to continue for a while. It seems very silly that you of all people insist otherwise, given your interests.
The paper which more directly supports the “made them smarter” claim seems to be this. I did somewhat anticipate this—“not much special about the primate brain other than ..”, but was not previously aware of this particular line of research and certainly would not have predicted their claimed outcome as the most likely vs various obvious alternatives. Upvoted for the interesting link.
Specifically I would not have predicted that the graft of human glial cells would have simultaneously both 1.) outcompeted the native mouse glial cells, and 2.) resulted in higher performance on a handful of interesting cognitive tests.
I’m still a bit skeptical of the “made them smarter” claim as it’s always best to taboo ‘smarter’ and they naturally could have cherrypicked the tests (even unintentionally), but it does look like the central claim—that injection of human GPCs (glial progenitor cells) into fetal mice does result in mice brains that learn at least some important tasks more quickly, and this is probably caused by facilitation of higher learning rates. However it seems to come at a cost of higher energy expenditure, so it’s not clear yet that this is a pure pareto improvement—could be a tradeoff worthwhile in larger sparser human brains but not in the mouse brain such that it wouldn’t translate into fitness advantage.
Or perhaps it is a straight up pareto improvement—that is not unheard of, viral horizontal gene transfer is a thing, etc.
We still seem to have some disconnect on the basic terminology. The brain is a universal learning machine, okay. The learning algorithms that govern it and its architecture are simple, okay, and the genome specifies only them. On our end, we can similarly implement the AGI-complete learning algorithms and architectures with relative ease, and they’d be pretty simple. Sure. I was holding the same views from the beginning.
But on your model, what is the universal learning machine learning, at runtime? Look-up tables?
On my model, one of the things it is learning is cognitive algorithms. And different classes of training setups + scale + training data result in it learning different cognitive algorithms; algorithms that can implement qualitatively different functionality. Scale is part of it: larger-scale brains have the room to learn different, more sophisticated algorithms.
And my claim is that some setups let the learning system learn a (holistic) general-intelligence algorithm.
You seem to consider the very idea of “algorithms” or “architectures” mattering silly. But what happens when a human groks how to do basic addition, then? They go around memorizing what sum each set of numbers maps to, and we’re more powerful than animals because we can memorize more numbers?
Shrug, okay, so let’s say evolution had to hit upon the Mammalia brain architecture. Would you agree with that?
Or we can expand further. Is there any taxon X for which you’d agree that “evolution had to hit upon the X brain architecture before raw scaling would’ve let it produce a generally intelligent species”?
Yes.
I consider a ULM to already encompass general/universal intelligence in the sense that a properly scaled ULM can learn anything, could become a superintelligence with vast scaling, etc.
I think I used specifically that example earlier in a related thread: The most common algorithm most humans are taught and learn is memorization of a small lookup table for single digit addition (and multiplication), combined with memorization of a short serial mental program for arbitrary digit addition. Some humans learn more advanced ‘tricks’ or short cuts, and more rarely perhaps even more complex, lower latency parallel addition circuits.
Core to the ULM view is the scaling hypothesis: once you have a universal learning architecture, novel capabilities emerge automatically with scale. Universal learning algorithms (as approximations of bayesian inference) are more powerful/scalable than genetic evolution, and if you think through what (greatly sped up) evolution running inside a brain during its lifetime would actually entail it becomes clear it could evolve any specific capabilities within hardware constraints, given sufficient training compute/time and an appropriate environment (training data).
There is nothing more general/universal than that, just as there is nothing more general/universal than turing machines.
Not really—evolution converged on a similar universal architecture in many different lineages. In vertebrates we have a few species of cetaceans, primates and pachyderms which all scaled up to large brain sizes, and some avian species also scaled up to primate level synaptic capacity (and associated tool/problem solving capabilities) with different but similar/equivalent convergent architecture. Language simply developed first in the primate homo genus, probably due to a confluence of factors. But its clear that brain scale—especially specifically the synaptic capacity of ‘upper’ brain regions—is the single most important predictive factor in terms of which brain lineage evolves language/culture first.
But even some invertebrates (octupi) are quite intelligent—and in each case there is convergence to similar algorithmic architecture, but achieved through different mechanisms (and predecessor structures).
It is not the case that the architecture of general intelligence is very complex and hard to evolve. It’s probably not more complex than the heart, or high quality eyes, etc. Instead it’s just that for a general purpose robot to invent recursive turing complete language from primitive communication—that development feat first appeared only at foundation model training scale ~10^25 flops equivalent. Obviously that is not the minimum compute for a ULM to accomplish that feat—but all animal brains are first and foremost robots, and thriving at real world robotics is incredibly challenging (general robotics is more challenging than language or early AGI, as all self-driving car companies are now finally learning). So language had to bootstrap from some random small excess plasticity budget, not the full training budget of the brain.
The greatest validation of the scaling hypothesis (and thus my 2015 ULM post) is the fact that AI systems began to match human performance once scaled up to similar levels of net training compute. GPT4 is at least as capable as human linguistic cortex in isolation; and matches a significant chunk of the capabilities of an intelligent human. It has far more semantic knowledge, but is weak in planning, creativity, and of course motor control/robotics. But none of that is surprising as it’s still missing a few main components that all intelligent brains contain (for agentic planning/search). But this is mostly a downstream compute limitation of current GPUs and algos vs neuromorphic/brains, and likely to be solved soon.
Thanks for detailed answers, that’s been quite illuminating! I still disagree, but I see the alternate perspective much clearer now, and what would look like notable evidence for/against it.
I agree with this
However how do you know that a massive advance isn’t still possible, especially as our NN can use stuff such as backprop, potentially quantum algorithm to train weights and other potential advances, that simply aren’t possible for nature to use? Say we figure out the brain learning algorithm, get AGI then quickly get something that uses the best of both nature and tech stuff not assessable to nature.
Of course a massive advance is possible, but mostly just in terms of raw speed. The brain seems reasonably close to pareto efficiency in intelligence per watt for irreversible computers, but in the next decade or so I expect we’ll close that gap as we move into more ‘neuromorphic’ or PIM computing (computation closer to memory). If we used the ~1e16w solar energy potential of just the Saraha desert that would support a population of trillions of brain-scale AIs or uploads running 1000x real-time.
The brain appears to already using algorithms similar to—but more efficient/effective—than standard backprop.
This is probably mostly a nothingburger for various reasons, but reversible computing could eventually provide some further improvement, especially in a better location like buried in the lunar cold spot.