Said pushback is based on empirical studies of how the most powerful AIs at our disposal currently work, and is supported by fairly convincing theoretical basis of its own. By comparison, the “canonical” takes are almost purely theoretical.
You aren’t really engaging with the evidence against the purely theoretical canonical/classical AI risk take. The ‘canonical’ AI risk argument is implicitly based on a set of interdependent assumptions/predictions about the nature of future AI:
fast takeoff is more likely than slow, downstream dependent on some combo of:
These arguments are old enough that we can now update based on how the implicit predictions of the implied worldviews turned out. The traditional EY/MIRI/LW view has not aged well, which in part can be traced to its dependence on an old flawed theory of how the brain works.
For those who read HPMOR/LW in their teens/20′s, a big chunk of your worldview is downstream of EY’s and the specific positions he landed on with respect to key scientific questions around the brain and AI. His understanding of the brain came almost entirely from ev psych
and cognitive biases literature and this model in particular—evolved modularity—hasn’t aged well and is just basically wrong. So this is entangled with everything related to AI risk (which is entirely about the trajectory of AI takeoff relative to human capability).
It’s not a coincidence that many in DL/neurosci have a very different view (shards etc). In particular the Moravec view that AI will come from reverse engineering the brain, that progress is entirely hardware constrained and thus very smooth and predictable, that is the view turned out to be mostly all correct. (his late 90′s prediction of AGI around 2028 is especially prescient)
So it’s pretty clear EY/LW was wrong on 1. - the trajectory of takeoff and path to AGI, and Moravec et al was correct.
Now as the underlying reasons are entangled, Moravec et al was also correct on point 2 - AI from brain reverse engineering is not alien! (But really that argument was just weak regardless.) EY did not seriously consider that the path to AGI would involve training massive neural networks to literally replicate human thoughts.
Point 3 Isn’t really taken seriously outside of the small LW sphere. By the very nature of alignment being a narrow target, any two random Unaligned AIs are especially unlikely to be aligned with each other. The idea of a magical coordination advantage is based on highly implausible code sharing premises (sharing your source code is generally a very bad idea, and regardless doesn’t and can’t actually prove that the code you shared is the code actually running in the world—the grounding problem is formidable and unsolved)
The problem with 4 - the analogy from evolution—is that it factually contradicts the doom worldview—evolution succeeded in aligning brains to IGF well enough despite a huge takeoff in the speed of cultural evolution over genetic evolution—as evidence by the fact that humans have one of the highest fitness scores of any species ever, and almost certainly the fastest growing fitness score.
You aren’t really engaging with the evidence against the purely theoretical canonical/classical AI risk take
Yes, but it’s because the things you’ve outlined seem mostly irrelevant to AGI Omnicide Risk to me? It’s not how I delineate the relevant parts of the classical view, and it’s not what’s been centrally targeted by the novel theories. The novel theories’ main claims are that powerful cognitive systems aren’t necessarily (isomorphic to) utility-maximizers, that shards (i. e., context-activated heuristics) reign supreme and value reflection can’t arbitrarily slip their leash, that “general intelligence” isn’t a compact algorithm, and so on. None of that relies on nanobots/Moore’s law/etc.
What you’ve outlined might or might not be the relevant historical reasons for how Eliezer/the LW community arrived at some of their takes. But the takes themselves, or at least the subset of them that I care about, are independent of these historical reasons.
fast takeoff is more likely than slow
Fast takeoff isn’t load-bearing on my model. I think it’s plausible for several reasons, but I think a non-self-improving human-genius-level AGI would probably be enough to kill off humanity.
the inherent ‘alien-ness’ of AI and AI values
I do address that? The values of two distant human cultures are already alien enough for them to see each other as inhuman and wish death on each other. It’s only after centuries of memetic mutation that we’ve managed to figure out how to not do that (as much).
supposed magical coordination advantages of AIs
I don’t think one needs to bring LDT/code-sharing stuff there in order to show intuitively how that’d totally work. “Powerful entities oppose each other yet nevertheless manage to coordinate to exploit the downtrodden masses” is a thing that happens all the time in the real world. Political/corporate conspiracies, etc.
“Highly implausible code-sharing premises” is part of why it’d be possible in the AGI case, but it’s just an instance of the overarching reason. Which is mainly about the more powerful systems being able to communicate with each other at higher bandwidth than with the weaker systems, allowing them to iterate on negotiations quicker and advocate for their interests during said negotiations better. Which effectively selects the negotiated outcomes for satisfying the preferences of powerful entities while effectively cutting out the weaker ones.
(Or something along these lines. That’s an off-the-top-of-my-head take; I haven’t thought on this topic much because multipolar scenarios isn’t something that’s central to my model. But it seems right.)
arguments from analogies: namely evolution
Yeah, we’ve discussed that some recently, and found points of disagreement. I should flesh out my view on how it’s applicable vs. not applicable later on, and make a separate post about that.
Yes, but it’s because the things you’ve outlined seem mostly irrelevant to AGI Omnicide Risk to me? It’s not how I delineate the relevant parts of the classical view, and it’s not what’s been centrally targeted by the novel theories
They are critically relevant. From your own linked post ( how I delineate ) :
We only have one shot. There will be a sharp discontinuity in capabilities once we get to AGI, and attempts to iterate on alignment will fail. Either we get AGI right on the first try, or we die.
If takeoff is slow (1) because brains are highly efficient and brain engineering is the viable path to AGI, then we naturally get many shots—via simulation simboxes if nothing else, and there is no sharp discontinuity if moore’s law also ends around the time of AGI (an outcome which brain efficiency—as a concept—predicts in advance).
We need to align the AGI’s values precisely right.
Not really—if the AGI is very similar to uploads, we just need to align them about as well as humans. Note this is intimately related to 1. and the technical relation between AGI and brains. If they are inevitably very similar then much of the classical AI risk argument dissolves.
You seem to be—like EY circa 2009 - in what I would call the EMH brain camp, as opposed to the ULM camp. It seems given the following two statements, you would put more weight on B than A:
A. The unique intellectual capabilities of humans are best explained by culture: our linguistically acquired mental programs, the evolution of which required vast synaptic capacity and thus is a natural emergent consequence of scaling.
B. The unique intellectual capabilities of humans are best explained by a unique architectural advance via genetic adaptations: a novel ‘core of generality’[1] that differentiates the human brain from animal brains.
There probably really is a series of core of generality insights in the difference between general mammal brain scaled to human size → general primate brain scaled to human size → actual human brain. Also, much of what matters is learned from culture. Both can be true at once.
But more to the point, I think you’re jumping to conclusions about what OP thinks. They haven’t said anything that sounds like EMH nonsense to me. Modularity is generated by runtime learning, and mechinterp studies it; there’s plenty of reason to think there might be ways to increase it, as you know. And that doesn’t even touch on the question of what training data.
If takeoff is slow (1) because brains are highly efficient and brain engineering is the viable path to AGI, then we naturally get many shots—via simulation simboxes if nothing else, and there is no sharp discontinuity if moore’s law also ends around the time of AGI (which brain efficiency predicts in advance).
My argument for the sharp discontinuity routes through the binary nature of general intelligence + an agency overhang, both of which could be hypothesized via non-evolution-based routes. Considerations about brain efficiency or Moore’s law don’t enter into it.
Brains are very different architectures compared to our computers, in any case, they implement computations in very different ways. They could be maximally efficient relative to their architectures, but so what? It’s not at all obvious that FLOPS estimates of brainpower are highly relevant to predicting when our models would hit AGI, any more than the brain’s wattage is relevant.
They’re only soundly relevant if you’re taking the hard “only compute matters, algorithms don’t” position, which I reject.
It seems given the following two statements, you would put more weight on B than A:
I think both are load-bearing, in a fairly obvious manner, and that which specific mixture is responsible matters comparatively little.
Architecture obviously matters. You wouldn’t get LLM performance out of a fully-connected neural network, certainly not at realistically implementable scales. Even more trivially, you wouldn’t get LLM performance out of an architecture that takes in the input, discards it, spends 10^25 FLOPS generating random numbers, then outputs one of them. It matters how your system learns.
So evolution did need to hit upon, say, the primate architecture, in order to get to general intelligence.
Training data obviously matters. Trivially, if you train your system on randomly-generated data, it’s not going to learn any useful algorithms, no matter how sophisticated its architecture is. More realistically, without the exposure to chemical experiments, or any data that hints at chemistry in any way, it’s not going to learn how to do chemistry.
Similarly, a human not exposed to stimuli that would let them learn the general-intelligence algorithms isn’t going to learn them. You’d brought up feral children before, and I agree it’s a relevant data point.
So, yes, there would be no sharp left turn caused by the AIs gradually bootstrapping a culture, because we’re already feeding them the data needed for that.
But that only means the sharp left turn caused by the architectural-advance part – the part we didn’t yet hit upon, the part that’s beyond LLMs, the “agency overhang” – would be that much sharper. The AGI, once we hit on an architecture that’d accommodate its cognition, would be able to skip the hundreds of years of cultural evolution.
Edit:
You seem to be—like EY circa 2009 - in what I would call the EMH brain camp
Nope. I’d read e. g. Steve Byrnes’ sequence, I agree that most of the brain’s algorithms are learned from scratch.
My argument for the sharp discontinuity routes through the binary nature of general intelligence + an agency overhang, both of which could be hypothesized via non-evolution-based routes. Considerations about brain efficiency or Moore’s law don’t enter into it.
You claim later to agree with ULM (learning from scratch) over evolved-modularity, but the paragraph above and statements like this in your link:
The homo sapiens sapiens spent thousands of years hunter-gathering before starting up civilization, even after achieving modern brain size.
It would still be generally capable in the limit, but it wouldn’t be instantly omnicide-capable.
So when the GI component first coalesces,
Suggest to me that you have only partly propagated the implications of ULM and the scaling hypothesis. There is no hard secret to AGI—the architecture of systems capable of scaling up to AGI is not especially complex to figure out, and has in fact been mostly known for decades (schmidhuber et al figured most of it out long before the DL revolution). This is all strongly implied by ULM/scaling, because the central premise of ULM is that GI is the result of massively scaling up simple algorithms and architectures. Intelligence is emergent from scaling simple algorithms, like complexity emerges from scaling of specific simple cellular automata rules (ie life).
All mammal brains share the same core architecture—not only is there nothing special about the human brain architecture, there is not much special about the primate brain other than hyperpameters better suited to scaling up to our size ( a better scaling program). I predicted the shape of transformers (before the first transformers paper) and their future success with scaling in 2015, but also see the Bitter Lesson from 2019.
That post from EY starts with a blatant lie—if you actually have read Mind Children, Moravec predicted AGI around 2028, not 2010.
So evolution did need to hit upon, say, the primate architecture, in order to get to general intelligence.
Not really—many other animal species are generally intelligent as demonstrated by general problem solving ability and proto-culture (elephants seem to have burial rituals, for example), they just lack full language/culture (which is the sharp threshold transition). Also at least one species of cetacean may have language or at least proto-language (jury’s still out on that), but no technology due to lack of suitable manipulators, environmental richness etc.
Its very clear that if you look at how the brain works in detail that the core architectural components of the human brain are all present in a mouse brain, just much smaller scale. The brain also just tiles simple universal architectural components to solve any problem (from vision to advanced mathematics), and those components are very similar to modern ANN components due to a combination of intentional reverse engineering and parallel evolution/convergence.
There are a few specific weaknesses of current transformer arch systems (lack of true recurrence), inference efficiency, etc but the solutions are all already in the pipes so to speak and are mostly efficiency multipliers rather than scaling discontinuities.
But that only means the sharp left turn caused by the architectural-advance part – the part we didn’t yet hit upon, the part that’s beyond LLMs,
So this again is EMH, not ULM—there is absolutely no architectural advance in the human brain over our primate ancestors worth mentioning, other than scale. I understand the brain deeply enough to support this statement with extensive citations (and have, in prior articles I’ve already linked).
Taboo ‘sharp left turn’ - it’s an EMH term. The ULM equivalent is “Cultural Criticality” or “Culture Meta-systems Transition”. Human intelligence is the result of culture—an abrupt transition from training datasets & knowledge of size O(1) human lifetime to ~O(N*T). It has nothing to do with any architectural advance. If you take a human brain and raise it by animals you just get a smart animal. The brain arch is already fully capable of advanced metalearning, but it won’t bootstrap to human STEM capability without an advanced education curriculum (the cultural transmission). Through culture we absorb the accumulated knowledge /wisdom of all of our ancestors, and this is a sharp transition. But it’s also a one time event! AGI won’t repeat that.
It’s a metasystems transition similar to the unicellular->multicellular transition.
not only is there nothing special about the human brain architecture, there is not much special about the primate brain other than hyperpameters better suited to scaling up to our size
I don’t think this is entirely true. Injecting human glial cells into mice made them smarter. certainly that doesn’t provide evidence for any sort of exponential difference, and you could argue it’s still just hyperparams, but it’s hyperparams that work better small too. I think we should be expecting sub linear growth in quality of the simple algorithms but should also be expecting that growth to continue for a while. It seems very silly that you of all people insist otherwise, given your interests.
We found that the glial chimeric mice exhibited both increased synaptic plasticity and improved cognitive performance, manifested by both enhanced long-term potentiation and improved performance in a variety of learning tasks (Han et al., 2013). In the context of that study, we were surprised to note that the forebrains of these animals were often composed primarily of human glia and their progenitors, with overt diminution in the relative proportion of resident mouse glial cells.
The paper which more directly supports the “made them smarter” claim seems to be this. I did somewhat anticipate this—“not much special about the primate brain other than ..”, but was not previously aware of this particular line of research and certainly would not have predicted their claimed outcome as the most likely vs various obvious alternatives. Upvoted for the interesting link.
Specifically I would not have predicted that the graft of human glial cells would have simultaneously both 1.) outcompeted the native mouse glial cells, and 2.) resulted in higher performance on a handful of interesting cognitive tests.
I’m still a bit skeptical of the “made them smarter” claim as it’s always best to taboo ‘smarter’ and they naturally could have cherrypicked the tests (even unintentionally), but it does look like the central claim—that injection of human GPCs (glial progenitor cells) into fetal mice does result in mice brains that learn at least some important tasks more quickly, and this is probably caused by facilitation of higher learning rates. However it seems to come at a cost of higher energy expenditure, so it’s not clear yet that this is a pure pareto improvement—could be a tradeoff worthwhile in larger sparser human brains but not in the mouse brain such that it wouldn’t translate into fitness advantage.
Or perhaps it is a straight up pareto improvement—that is not unheard of, viral horizontal gene transfer is a thing, etc.
We still seem to have some disconnect on the basic terminology. The brain is a universal learning machine, okay. The learning algorithms that govern it and its architecture are simple, okay, and the genome specifies only them. On our end, we can similarly implement the AGI-complete learning algorithms and architectures with relative ease, and they’d be pretty simple. Sure. I was holding the same views from the beginning.
But on your model, what is the universal learning machine learning, at runtime? Look-up tables?
On my model, one of the things it is learning is cognitive algorithms. And different classes of training setups + scale + training data result in it learning different cognitive algorithms; algorithms that can implement qualitatively different functionality. Scale is part of it: larger-scale brains have the room to learn different, more sophisticated algorithms.
And my claim is that some setups let the learning system learn a (holistic) general-intelligence algorithm.
You seem to consider the very idea of “algorithms” or “architectures” mattering silly. But what happens when a human groks how to do basic addition, then? They go around memorizing what sum each set of numbers maps to, and we’re more powerful than animals because we can memorize more numbers?
Its very clear that if you look at how the brain works in detail that the core architectural components of the human brain are all present in a mouse brain, just much smaller scale
Shrug, okay, so let’s say evolution had to hit upon the Mammalia brain architecture. Would you agree with that?
Or we can expand further. Is there any taxon X for which you’d agree that “evolution had to hit upon the X brain architecture before raw scaling would’ve let it produce a generally intelligent species”?
But on your model, what is the universal learning machine learning, at runtime? ..
On my model, one of the things it is learning is cognitive algorithms. And different classes of training setups + scale + training data result in it learning different cognitive algorithms; algorithms that can implement qualitatively different functionality.
Yes.
And my claim is that some setups let the learning system learn a (holistic) general-intelligence algorithm.
I consider a ULM to already encompass general/universal intelligence in the sense that a properly scaled ULM can learn anything, could become a superintelligence with vast scaling, etc.
You seem to consider the very idea of “algorithms” or “architectures” mattering silly. But what happens when a human groks how to do basic addition, then? They go around memorizing what sum each set of numbers maps to, and we’re more powerful than animals because we can memorize more numbers?
I think I used specifically that example earlier in a related thread: The most common algorithm most humans are taught and learn is memorization of a small lookup table for single digit addition (and multiplication), combined with memorization of a short serial mental program for arbitrary digit addition. Some humans learn more advanced ‘tricks’ or short cuts, and more rarely perhaps even more complex, lower latency parallel addition circuits.
Core to the ULM view is the scaling hypothesis: once you have a universal learning architecture, novel capabilities emerge automatically with scale. Universal learning algorithms (as approximations of bayesian inference) are more powerful/scalable than genetic evolution, and if you think through what (greatly sped up) evolution running inside a brain during its lifetime would actually entail it becomes clear it could evolve any specific capabilities within hardware constraints, given sufficient training compute/time and an appropriate environment (training data).
There is nothing more general/universal than that, just as there is nothing more general/universal than turing machines.
Is there any taxon X for which you’d agree that “evolution had to hit upon the X brain architecture before raw scaling would’ve let it produce a generally intelligent species”?
Not really—evolution converged on a similar universal architecture in many different lineages. In vertebrates we have a few species of cetaceans, primates and pachyderms which all scaled up to large brain sizes, and some avian species also scaled up to primate level synaptic capacity (and associated tool/problem solving capabilities) with different but similar/equivalent convergent architecture. Language simply developed first in the primate homo genus, probably due to a confluence of factors. But its clear that brain scale—especially specifically the synaptic capacity of ‘upper’ brain regions—is the single most important predictive factor in terms of which brain lineage evolves language/culture first.
But even some invertebrates (octupi) are quite intelligent—and in each case there is convergence to similar algorithmic architecture, but achieved through different mechanisms (and predecessor structures).
It is not the case that the architecture of general intelligence is very complex and hard to evolve. It’s probably not more complex than the heart, or high quality eyes, etc. Instead it’s just that for a general purpose robot to invent recursive turing complete language from primitive communication—that development feat first appeared only at foundation model training scale ~10^25 flops equivalent. Obviously that is not the minimum compute for a ULM to accomplish that feat—but all animal brains are first and foremost robots, and thriving at real world robotics is incredibly challenging (general robotics is more challenging than language or early AGI, as all self-driving car companies are now finally learning). So language had to bootstrap from some random small excess plasticity budget, not the full training budget of the brain.
The greatest validation of the scaling hypothesis (and thus my 2015 ULM post) is the fact that AI systems began to match human performance once scaled up to similar levels of net training compute. GPT4 is at least as capable as human linguistic cortex in isolation; and matches a significant chunk of the capabilities of an intelligent human. It has far more semantic knowledge, but is weak in planning, creativity, and of course motor control/robotics. But none of that is surprising as it’s still missing a few main components that all intelligent brains contain (for agentic planning/search). But this is mostly a downstream compute limitation of current GPUs and algos vs neuromorphic/brains, and likely to be solved soon.
Thanks for detailed answers, that’s been quite illuminating! I still disagree, but I see the alternate perspective much clearer now, and what would look like notable evidence for/against it.
- there is absolutely no architectural advance in the human brain over our primate ancestors worth mentioning, other than scale”
However how do you know that a massive advance isn’t still possible, especially as our NN can use stuff such as backprop, potentially quantum algorithm to train weights and other potential advances, that simply aren’t possible for nature to use? Say we figure out the brain learning algorithm, get AGI then quickly get something that uses the best of both nature and tech stuff not assessable to nature.
Of course a massive advance is possible, but mostly just in terms of raw speed. The brain seems reasonably close to pareto efficiency in intelligence per watt for irreversible computers, but in the next decade or so I expect we’ll close that gap as we move into more ‘neuromorphic’ or PIM computing (computation closer to memory). If we used the ~1e16w solar energy potential of just the Saraha desert that would support a population of trillions of brain-scale AIs or uploads running 1000x real-time.
especially as our NN can use stuff such as backprop,
The brain appears to already using algorithms similar to—but more efficient/effective—than standard backprop.
potentially quantum algorithm to train weights
This is probably mostly a nothingburger for various reasons, but reversible computing could eventually provide some further improvement, especially in a better location like buried in the lunar cold spot.
Wouldn’t you expect (the many) current attempts to agentize LLMs to eat up a lot of the ‘agency overhang’? Especially since, something like the reflection/planning loops of agentized LLMs seem to me like a pretty plausible description of what human brains might be doing (e.g. system 2 / system 1, or see many of Seth Herd’s recent writings on agentized / scaffolded LLMs and similarities to cognitive architectures).
I don’t think the current attempts have eaten the agency overhang at all. Basically none of them have worked, so the agency advantage hasn’t been realized. But the public efforts just haven’t put that much person-power into improving memory or executive function systems.
So I’m predicting a discontinuity in capabilities just like Thane is suggesting. I wrote another short post trying to capture the cognitive intuition: Sapience, understanding, and “AGI” I think it might be a bit less sharp, since you might get an agent sort-of-working before it works really well. But the agency overhang is still there right now.
All of the points you listed make AGI risk worse, but none are necessary to have major concerns about it. That’s why they didn’t appear in the post’s summary of AGI x-risk logic.
I think this is a common and dangerous misconception. The original AGI x-risk story was wrong in many places. But that does not mean x-risk isn’t real.
To a first approximation my futurism is time acceleration; so the risks are the typical risks sans AI, but the timescale is hyperexponential ala roodman. Even a more gradual takeoff would imply more risk to global stability on faster timescales than anything we’ve experience in history; the wrong AGI race winners could create various dystopias.
I can’t point to such a site, however you should be aware of AI Optimists, not sure if Jacob plans to write there. Also follow the work of Quentin Pope, Alex Turner, Nora Belrose etc. I expect the site would point out what they feel to be the most important risks. I don’t know of anyone rational, no matter how optimistic who doesn’t think there are substantial ones.
If you meant for current LLMs, some of them could be misuse of current LLM by humans, or risks such as harmful content, harmful hallucination, privacy, memorization, bias, etc. For some other models such as ranking/multiple ranking, I have heard some other worries on deception as well (this is only what I recall of hearing, so it might be completely wrong).
You aren’t really engaging with the evidence against the purely theoretical canonical/classical AI risk take. The ‘canonical’ AI risk argument is implicitly based on a set of interdependent assumptions/predictions about the nature of future AI:
fast takeoff is more likely than slow, downstream dependent on some combo of:
continuation of Moore’s Law
feasibility of hard ‘diamondoid’ nanotech
brain efficiency vs AI
AI hardware (in)-dependence
the inherent ‘alien-ness’ of AI and AI values
supposed magical coordination advantages of AIs
arguments from analogies: namely evolution
These arguments are old enough that we can now update based on how the implicit predictions of the implied worldviews turned out. The traditional EY/MIRI/LW view has not aged well, which in part can be traced to its dependence on an old flawed theory of how the brain works.
For those who read HPMOR/LW in their teens/20′s, a big chunk of your worldview is downstream of EY’s and the specific positions he landed on with respect to key scientific questions around the brain and AI. His understanding of the brain came almost entirely from ev psych and cognitive biases literature and this model in particular—evolved modularity—hasn’t aged well and is just basically wrong. So this is entangled with everything related to AI risk (which is entirely about the trajectory of AI takeoff relative to human capability).
It’s not a coincidence that many in DL/neurosci have a very different view (shards etc). In particular the Moravec view that AI will come from reverse engineering the brain, that progress is entirely hardware constrained and thus very smooth and predictable, that is the view turned out to be mostly all correct. (his late 90′s prediction of AGI around 2028 is especially prescient)
So it’s pretty clear EY/LW was wrong on 1. - the trajectory of takeoff and path to AGI, and Moravec et al was correct.
Now as the underlying reasons are entangled, Moravec et al was also correct on point 2 - AI from brain reverse engineering is not alien! (But really that argument was just weak regardless.) EY did not seriously consider that the path to AGI would involve training massive neural networks to literally replicate human thoughts.
Point 3 Isn’t really taken seriously outside of the small LW sphere. By the very nature of alignment being a narrow target, any two random Unaligned AIs are especially unlikely to be aligned with each other. The idea of a magical coordination advantage is based on highly implausible code sharing premises (sharing your source code is generally a very bad idea, and regardless doesn’t and can’t actually prove that the code you shared is the code actually running in the world—the grounding problem is formidable and unsolved)
The problem with 4 - the analogy from evolution—is that it factually contradicts the doom worldview—evolution succeeded in aligning brains to IGF well enough despite a huge takeoff in the speed of cultural evolution over genetic evolution—as evidence by the fact that humans have one of the highest fitness scores of any species ever, and almost certainly the fastest growing fitness score.
Yes, but it’s because the things you’ve outlined seem mostly irrelevant to AGI Omnicide Risk to me? It’s not how I delineate the relevant parts of the classical view, and it’s not what’s been centrally targeted by the novel theories. The novel theories’ main claims are that powerful cognitive systems aren’t necessarily (isomorphic to) utility-maximizers, that shards (i. e., context-activated heuristics) reign supreme and value reflection can’t arbitrarily slip their leash, that “general intelligence” isn’t a compact algorithm, and so on. None of that relies on nanobots/Moore’s law/etc.
What you’ve outlined might or might not be the relevant historical reasons for how Eliezer/the LW community arrived at some of their takes. But the takes themselves, or at least the subset of them that I care about, are independent of these historical reasons.
Fast takeoff isn’t load-bearing on my model. I think it’s plausible for several reasons, but I think a non-self-improving human-genius-level AGI would probably be enough to kill off humanity.
I do address that? The values of two distant human cultures are already alien enough for them to see each other as inhuman and wish death on each other. It’s only after centuries of memetic mutation that we’ve managed to figure out how to not do that (as much).
I don’t think one needs to bring LDT/code-sharing stuff there in order to show intuitively how that’d totally work. “Powerful entities oppose each other yet nevertheless manage to coordinate to exploit the downtrodden masses” is a thing that happens all the time in the real world. Political/corporate conspiracies, etc.
“Highly implausible code-sharing premises” is part of why it’d be possible in the AGI case, but it’s just an instance of the overarching reason. Which is mainly about the more powerful systems being able to communicate with each other at higher bandwidth than with the weaker systems, allowing them to iterate on negotiations quicker and advocate for their interests during said negotiations better. Which effectively selects the negotiated outcomes for satisfying the preferences of powerful entities while effectively cutting out the weaker ones.
(Or something along these lines. That’s an off-the-top-of-my-head take; I haven’t thought on this topic much because multipolar scenarios isn’t something that’s central to my model. But it seems right.)
Yeah, we’ve discussed that some recently, and found points of disagreement. I should flesh out my view on how it’s applicable vs. not applicable later on, and make a separate post about that.
They are critically relevant. From your own linked post ( how I delineate ) :
If takeoff is slow (1) because brains are highly efficient and brain engineering is the viable path to AGI, then we naturally get many shots—via simulation simboxes if nothing else, and there is no sharp discontinuity if moore’s law also ends around the time of AGI (an outcome which brain efficiency—as a concept—predicts in advance).
Not really—if the AGI is very similar to uploads, we just need to align them about as well as humans. Note this is intimately related to 1. and the technical relation between AGI and brains. If they are inevitably very similar then much of the classical AI risk argument dissolves.
You seem to be—like EY circa 2009 - in what I would call the EMH brain camp, as opposed to the ULM camp. It seems given the following two statements, you would put more weight on B than A:
A. The unique intellectual capabilities of humans are best explained by culture: our linguistically acquired mental programs, the evolution of which required vast synaptic capacity and thus is a natural emergent consequence of scaling.
B. The unique intellectual capabilities of humans are best explained by a unique architectural advance via genetic adaptations: a novel ‘core of generality’[1] that differentiates the human brain from animal brains.
This is a EY term; and if I recall correctly he still uses it fairly recently.
There probably really is a series of core of generality insights in the difference between general mammal brain scaled to human size → general primate brain scaled to human size → actual human brain. Also, much of what matters is learned from culture. Both can be true at once.
But more to the point, I think you’re jumping to conclusions about what OP thinks. They haven’t said anything that sounds like EMH nonsense to me. Modularity is generated by runtime learning, and mechinterp studies it; there’s plenty of reason to think there might be ways to increase it, as you know. And that doesn’t even touch on the question of what training data.
My argument for the sharp discontinuity routes through the binary nature of general intelligence + an agency overhang, both of which could be hypothesized via non-evolution-based routes. Considerations about brain efficiency or Moore’s law don’t enter into it.
Brains are very different architectures compared to our computers, in any case, they implement computations in very different ways. They could be maximally efficient relative to their architectures, but so what? It’s not at all obvious that FLOPS estimates of brainpower are highly relevant to predicting when our models would hit AGI, any more than the brain’s wattage is relevant.
They’re only soundly relevant if you’re taking the hard “only compute matters, algorithms don’t” position, which I reject.
I think both are load-bearing, in a fairly obvious manner, and that which specific mixture is responsible matters comparatively little.
Architecture obviously matters. You wouldn’t get LLM performance out of a fully-connected neural network, certainly not at realistically implementable scales. Even more trivially, you wouldn’t get LLM performance out of an architecture that takes in the input, discards it, spends 10^25 FLOPS generating random numbers, then outputs one of them. It matters how your system learns.
So evolution did need to hit upon, say, the primate architecture, in order to get to general intelligence.
Training data obviously matters. Trivially, if you train your system on randomly-generated data, it’s not going to learn any useful algorithms, no matter how sophisticated its architecture is. More realistically, without the exposure to chemical experiments, or any data that hints at chemistry in any way, it’s not going to learn how to do chemistry.
Similarly, a human not exposed to stimuli that would let them learn the general-intelligence algorithms isn’t going to learn them. You’d brought up feral children before, and I agree it’s a relevant data point.
So, yes, there would be no sharp left turn caused by the AIs gradually bootstrapping a culture, because we’re already feeding them the data needed for that.
But that only means the sharp left turn caused by the architectural-advance part – the part we didn’t yet hit upon, the part that’s beyond LLMs, the “agency overhang” – would be that much sharper. The AGI, once we hit on an architecture that’d accommodate its cognition, would be able to skip the hundreds of years of cultural evolution.
Edit:
Nope. I’d read e. g. Steve Byrnes’ sequence, I agree that most of the brain’s algorithms are learned from scratch.
You claim later to agree with ULM (learning from scratch) over evolved-modularity, but the paragraph above and statements like this in your link:
Suggest to me that you have only partly propagated the implications of ULM and the scaling hypothesis. There is no hard secret to AGI—the architecture of systems capable of scaling up to AGI is not especially complex to figure out, and has in fact been mostly known for decades (schmidhuber et al figured most of it out long before the DL revolution). This is all strongly implied by ULM/scaling, because the central premise of ULM is that GI is the result of massively scaling up simple algorithms and architectures. Intelligence is emergent from scaling simple algorithms, like complexity emerges from scaling of specific simple cellular automata rules (ie life).
All mammal brains share the same core architecture—not only is there nothing special about the human brain architecture, there is not much special about the primate brain other than hyperpameters better suited to scaling up to our size ( a better scaling program). I predicted the shape of transformers (before the first transformers paper) and their future success with scaling in 2015, but also see the Bitter Lesson from 2019.
That post from EY starts with a blatant lie—if you actually have read Mind Children, Moravec predicted AGI around 2028, not 2010.
Not really—many other animal species are generally intelligent as demonstrated by general problem solving ability and proto-culture (elephants seem to have burial rituals, for example), they just lack full language/culture (which is the sharp threshold transition). Also at least one species of cetacean may have language or at least proto-language (jury’s still out on that), but no technology due to lack of suitable manipulators, environmental richness etc.
Its very clear that if you look at how the brain works in detail that the core architectural components of the human brain are all present in a mouse brain, just much smaller scale. The brain also just tiles simple universal architectural components to solve any problem (from vision to advanced mathematics), and those components are very similar to modern ANN components due to a combination of intentional reverse engineering and parallel evolution/convergence.
There are a few specific weaknesses of current transformer arch systems (lack of true recurrence), inference efficiency, etc but the solutions are all already in the pipes so to speak and are mostly efficiency multipliers rather than scaling discontinuities.
So this again is EMH, not ULM—there is absolutely no architectural advance in the human brain over our primate ancestors worth mentioning, other than scale. I understand the brain deeply enough to support this statement with extensive citations (and have, in prior articles I’ve already linked).
Taboo ‘sharp left turn’ - it’s an EMH term. The ULM equivalent is “Cultural Criticality” or “Culture Meta-systems Transition”. Human intelligence is the result of culture—an abrupt transition from training datasets & knowledge of size O(1) human lifetime to ~O(N*T). It has nothing to do with any architectural advance. If you take a human brain and raise it by animals you just get a smart animal. The brain arch is already fully capable of advanced metalearning, but it won’t bootstrap to human STEM capability without an advanced education curriculum (the cultural transmission). Through culture we absorb the accumulated knowledge /wisdom of all of our ancestors, and this is a sharp transition. But it’s also a one time event! AGI won’t repeat that.
It’s a metasystems transition similar to the unicellular->multicellular transition.
I don’t think this is entirely true. Injecting human glial cells into mice made them smarter. certainly that doesn’t provide evidence for any sort of exponential difference, and you could argue it’s still just hyperparams, but it’s hyperparams that work better small too. I think we should be expecting sub linear growth in quality of the simple algorithms but should also be expecting that growth to continue for a while. It seems very silly that you of all people insist otherwise, given your interests.
The paper which more directly supports the “made them smarter” claim seems to be this. I did somewhat anticipate this—“not much special about the primate brain other than ..”, but was not previously aware of this particular line of research and certainly would not have predicted their claimed outcome as the most likely vs various obvious alternatives. Upvoted for the interesting link.
Specifically I would not have predicted that the graft of human glial cells would have simultaneously both 1.) outcompeted the native mouse glial cells, and 2.) resulted in higher performance on a handful of interesting cognitive tests.
I’m still a bit skeptical of the “made them smarter” claim as it’s always best to taboo ‘smarter’ and they naturally could have cherrypicked the tests (even unintentionally), but it does look like the central claim—that injection of human GPCs (glial progenitor cells) into fetal mice does result in mice brains that learn at least some important tasks more quickly, and this is probably caused by facilitation of higher learning rates. However it seems to come at a cost of higher energy expenditure, so it’s not clear yet that this is a pure pareto improvement—could be a tradeoff worthwhile in larger sparser human brains but not in the mouse brain such that it wouldn’t translate into fitness advantage.
Or perhaps it is a straight up pareto improvement—that is not unheard of, viral horizontal gene transfer is a thing, etc.
We still seem to have some disconnect on the basic terminology. The brain is a universal learning machine, okay. The learning algorithms that govern it and its architecture are simple, okay, and the genome specifies only them. On our end, we can similarly implement the AGI-complete learning algorithms and architectures with relative ease, and they’d be pretty simple. Sure. I was holding the same views from the beginning.
But on your model, what is the universal learning machine learning, at runtime? Look-up tables?
On my model, one of the things it is learning is cognitive algorithms. And different classes of training setups + scale + training data result in it learning different cognitive algorithms; algorithms that can implement qualitatively different functionality. Scale is part of it: larger-scale brains have the room to learn different, more sophisticated algorithms.
And my claim is that some setups let the learning system learn a (holistic) general-intelligence algorithm.
You seem to consider the very idea of “algorithms” or “architectures” mattering silly. But what happens when a human groks how to do basic addition, then? They go around memorizing what sum each set of numbers maps to, and we’re more powerful than animals because we can memorize more numbers?
Shrug, okay, so let’s say evolution had to hit upon the Mammalia brain architecture. Would you agree with that?
Or we can expand further. Is there any taxon X for which you’d agree that “evolution had to hit upon the X brain architecture before raw scaling would’ve let it produce a generally intelligent species”?
Yes.
I consider a ULM to already encompass general/universal intelligence in the sense that a properly scaled ULM can learn anything, could become a superintelligence with vast scaling, etc.
I think I used specifically that example earlier in a related thread: The most common algorithm most humans are taught and learn is memorization of a small lookup table for single digit addition (and multiplication), combined with memorization of a short serial mental program for arbitrary digit addition. Some humans learn more advanced ‘tricks’ or short cuts, and more rarely perhaps even more complex, lower latency parallel addition circuits.
Core to the ULM view is the scaling hypothesis: once you have a universal learning architecture, novel capabilities emerge automatically with scale. Universal learning algorithms (as approximations of bayesian inference) are more powerful/scalable than genetic evolution, and if you think through what (greatly sped up) evolution running inside a brain during its lifetime would actually entail it becomes clear it could evolve any specific capabilities within hardware constraints, given sufficient training compute/time and an appropriate environment (training data).
There is nothing more general/universal than that, just as there is nothing more general/universal than turing machines.
Not really—evolution converged on a similar universal architecture in many different lineages. In vertebrates we have a few species of cetaceans, primates and pachyderms which all scaled up to large brain sizes, and some avian species also scaled up to primate level synaptic capacity (and associated tool/problem solving capabilities) with different but similar/equivalent convergent architecture. Language simply developed first in the primate homo genus, probably due to a confluence of factors. But its clear that brain scale—especially specifically the synaptic capacity of ‘upper’ brain regions—is the single most important predictive factor in terms of which brain lineage evolves language/culture first.
But even some invertebrates (octupi) are quite intelligent—and in each case there is convergence to similar algorithmic architecture, but achieved through different mechanisms (and predecessor structures).
It is not the case that the architecture of general intelligence is very complex and hard to evolve. It’s probably not more complex than the heart, or high quality eyes, etc. Instead it’s just that for a general purpose robot to invent recursive turing complete language from primitive communication—that development feat first appeared only at foundation model training scale ~10^25 flops equivalent. Obviously that is not the minimum compute for a ULM to accomplish that feat—but all animal brains are first and foremost robots, and thriving at real world robotics is incredibly challenging (general robotics is more challenging than language or early AGI, as all self-driving car companies are now finally learning). So language had to bootstrap from some random small excess plasticity budget, not the full training budget of the brain.
The greatest validation of the scaling hypothesis (and thus my 2015 ULM post) is the fact that AI systems began to match human performance once scaled up to similar levels of net training compute. GPT4 is at least as capable as human linguistic cortex in isolation; and matches a significant chunk of the capabilities of an intelligent human. It has far more semantic knowledge, but is weak in planning, creativity, and of course motor control/robotics. But none of that is surprising as it’s still missing a few main components that all intelligent brains contain (for agentic planning/search). But this is mostly a downstream compute limitation of current GPUs and algos vs neuromorphic/brains, and likely to be solved soon.
Thanks for detailed answers, that’s been quite illuminating! I still disagree, but I see the alternate perspective much clearer now, and what would look like notable evidence for/against it.
I agree with this
However how do you know that a massive advance isn’t still possible, especially as our NN can use stuff such as backprop, potentially quantum algorithm to train weights and other potential advances, that simply aren’t possible for nature to use? Say we figure out the brain learning algorithm, get AGI then quickly get something that uses the best of both nature and tech stuff not assessable to nature.
Of course a massive advance is possible, but mostly just in terms of raw speed. The brain seems reasonably close to pareto efficiency in intelligence per watt for irreversible computers, but in the next decade or so I expect we’ll close that gap as we move into more ‘neuromorphic’ or PIM computing (computation closer to memory). If we used the ~1e16w solar energy potential of just the Saraha desert that would support a population of trillions of brain-scale AIs or uploads running 1000x real-time.
The brain appears to already using algorithms similar to—but more efficient/effective—than standard backprop.
This is probably mostly a nothingburger for various reasons, but reversible computing could eventually provide some further improvement, especially in a better location like buried in the lunar cold spot.
Wouldn’t you expect (the many) current attempts to agentize LLMs to eat up a lot of the ‘agency overhang’? Especially since, something like the reflection/planning loops of agentized LLMs seem to me like a pretty plausible description of what human brains might be doing (e.g. system 2 / system 1, or see many of Seth Herd’s recent writings on agentized / scaffolded LLMs and similarities to cognitive architectures).
I don’t think the current attempts have eaten the agency overhang at all. Basically none of them have worked, so the agency advantage hasn’t been realized. But the public efforts just haven’t put that much person-power into improving memory or executive function systems.
So I’m predicting a discontinuity in capabilities just like Thane is suggesting. I wrote another short post trying to capture the cognitive intuition: Sapience, understanding, and “AGI” I think it might be a bit less sharp, since you might get an agent sort-of-working before it works really well. But the agency overhang is still there right now.
All of the points you listed make AGI risk worse, but none are necessary to have major concerns about it. That’s why they didn’t appear in the post’s summary of AGI x-risk logic.
I think this is a common and dangerous misconception. The original AGI x-risk story was wrong in many places. But that does not mean x-risk isn’t real.
Do you have a post or blog post on the risks we do need to worry about?
No, and that’s a reasonable ask.
To a first approximation my futurism is time acceleration; so the risks are the typical risks sans AI, but the timescale is hyperexponential ala roodman. Even a more gradual takeoff would imply more risk to global stability on faster timescales than anything we’ve experience in history; the wrong AGI race winners could create various dystopias.
I can’t point to such a site, however you should be aware of AI Optimists, not sure if Jacob plans to write there. Also follow the work of Quentin Pope, Alex Turner, Nora Belrose etc. I expect the site would point out what they feel to be the most important risks. I don’t know of anyone rational, no matter how optimistic who doesn’t think there are substantial ones.
If you meant for current LLMs, some of them could be misuse of current LLM by humans, or risks such as harmful content, harmful hallucination, privacy, memorization, bias, etc. For some other models such as ranking/multiple ranking, I have heard some other worries on deception as well (this is only what I recall of hearing, so it might be completely wrong).