Eliezer Yudkowsky predicts doom from AI: that humanity faces likely extinction in the near future (years or decades) from a rogue unaligned superintelligent AI system. Moreover he predicts that this is the default outcome, and AI alignment is so incredibly difficult that even he failed to solve it.
EY is an entertaining and skilled writer, but do not confuse rhetorical writing talent for depth and breadth of technical knowledge. I do not have EY’s talents there, or Scott Alexander’s poetic powers of prose. My skill points instead have gone near exclusively towards extensive study of neuroscience, deep learning, and graphics/GPU programming. More than most, I actually have the depth and breadth of technical knowledge necessary to evaluate these claims in detail.
I have evaluated this model in detail and found it substantially incorrect and in fact brazenly naively overconfident.
Intro
Even though the central prediction of the doom model is necessarily un-observable for anthropic reasons, alternative models (such as my own, or moravec’s, or hanson’s) have already made substantially better predictions, such that EY’s doom model has low posterior probability.
EY has espoused this doom model for over a decade, and hasn’t updated it much from what I can tell. Here is the classic doom model as I understand it, starting first with key background assumptions/claims:
Brain inefficiency: The human brain is inefficient in multiple dimensions/ways/metrics that translate into intelligence per dollar; inefficient as a hardware platform in key metrics such as thermodynamic efficiency.
Mind inefficiency or human incompetence: In terms of software he describes the brain as an inefficient complex “kludgy mess of spaghetti-code”. He derived these insights from the influential evolved modularity hypothesis as popularized in ev pysch by Tooby and Cosmides. He boo-hooed neural networks, and in fact actively bet against them in actions by hiring researchers trained in abstract math/philosophy, ignoring neuroscience and early DL, etc.
More room at the bottom: Naturally dovetailing with points 1 and 2, EY confidently predicts there is enormous room for further software and hardware improvement, the latter especially through strong drexlerian nanotech.
That Alien mindspace: EY claims human mindspace is an incredibly narrow twisty complex target to hit, whereas the space of AI mindspace is vast, and AI designs will be something like random rolls from this vast alien landscape resulting in an incredibly low probability of hitting the narrow human target.
Doom naturally follows from these assumptions: Sometime in the near future some team discovers the hidden keys of intelligence and creates a human-level AGI which then rewrites its own source code, initiating a self improvement recursion cascade which ultimately increases the AGI’s computational efficiency (intelligence/$, intelligence/J, etc) by many OOM to far surpass human brains, which then quickly results in the AGI developing strong nanotech and killing all humans within a matter of days or even hours.
If assumptions 1 and 2 don’t hold (relative to 3) then there is little to no room for recursive self improvement. If assumption 4 is completely wrong then the default outcome is not doom regardless.
Every one of his key assumptions is mostly wrong, as I and others predicted well in advance. EY seems to have been systematically overconfident as an early futurist, and then perhaps updated later to avoid specific predictions, but without much updating his mental models (specifically his nanotech-woo model, as we will see).
Brain Hardware Efficiency
EY correctly recognizes that thermodynamic efficiency is a key metric for computation/intelligence, and he confidently, brazenly claims (as of late 2021), that the brain is not that efficient and about 6 OOM from thermodynamic limits:
Which brings me to the second line of very obvious-seeming reasoning that converges upon the same conclusion—that it is in principle possible to build an AGI much more computationally efficient than a human brain—namely that biology is simply not that efficient, and especially when it comes to huge complicated things that it has started doing relatively recently.
ATP synthase may be close to 100% thermodynamically efficient, but ATP synthase is literally over 1.5 billion years old and a core bottleneck on all biological metabolism. Brains have to pump thousands of ions in and out of each stretch of axon and dendrite, in order to restore their ability to fire another fast neural spike. The result is that the brain’s computation is something like half a million times less efficient than the thermodynamic limit for its temperature—so around two millionths as efficient as ATP synthase. And neurons are a hell of a lot older than the biological software for general intelligence!
The software for a human brain is not going to be 100% efficient compared to the theoretical maximum, nor 10% efficient, nor 1% efficient, even before taking into account the whole thing with parallelism vs. serialism, precision vs. imprecision, or similarly clear low-level differences.
EY is just completely out of his depth here: he doesn’t seem to understand how the Landauer limit actually works, doesn’t seem to understand that synapses are analog MACs which minimally require OOMs more energy than simple binary switches, doesn’t seem to have a good model of the interconnect requirements, etc.
Some attempt to defend EY by invoking reversible computing, but EY explicitly states that ATP synthase may be close to 100% thermodynamically efficient, and explicitly links the end result of extreme inefficiency to the specific cause of pumping “thousands of ions in and out of each stretch of axon and dendrite”—which would be irrelevant when comparing to some exotic reversible superconducting or optical computer. Given that he doesn’t mention reversible computing and the hint “biology is simply not that efficient” helps establish we are both discussing conventional irreversible computation: not exotic reversible or quantum computing (neither of which are practical in the near future or relevant for the nanotech he envisions, which is fundamentally robotic and thus constrained by the efficiency of applying energy to irreversibly transform matter). He seems to believe biology is inefficient even given the practical constraints it is working with, not inefficient compared to all possible future hypothetical exotic computing platforms without consideration for other tradeoffs. Finally if he actually believes (as I do) that brains are efficient within the constraints of conventional irreversible computation, this would in fact substantially weaken his larger argument—and EY is not the kind of writer who weakens his own arguments.
In actuality biology is incredibly thermodynamically efficient, and generally seems to be near pareto-optimal in that regard at the cellular nanobot level, but we’ll get back to that.
In a 30 year human “training run” the brain uses somewhere between 1e23 to 1e25 flops. ANNs trained with this amount of compute already capture much—but not all—of human intelligence. One likely reason is that flops is not the only metric of relevance, and a human brain training run also uses 1e23 to 1e25 bytes of memops, which is still OOM more than the likely largest ANN training run to date (GPT4) - because GPUs have a 2 or 3 OOM gap between flops and memops.
My model instead predicts that AGI will require GPT4 -ish levels of training compute, and SI will require far more. To the extent that recursive self-improvement is actually a thing in the NN paradigm, it’s something that NNs mostly just do automatically (and something the brain currently still does better than ANNs).
Mind Software Efficiency
EY derived much of his negative beliefs about the human mind from the cognitive biases and ev psych literature, and especially Tooby and Cosmide’s influential evolved modularity hypothesis. The primary competitor to evolved modularity was/is the universal learning hypothesis and associated scaling hypothesis, and there was already sufficient evidence to rule out evolved modularity back in 2015 or earlier.
Let’s quickly assess the predictions of evolved modularity vs universal learning/scaling. Evolved modularity posits that the brain is a kludgy mess of domain specific evolved mechanisms (“spaghetti code” in EY’s words), and thus AGI will probably not come from brain reverse engineering. Human intelligence is exceptional because evolution figured out some “core to generality” that prior primate brains don’t have, but humans have only the minimal early version of this, and there is likely huge room for further improvement.
The universal learning/scaling model instead posits that there is a single obvious algorithmic signature for intelligence (approx bayesian inference), it isn’t that hard to figure out, evolution found it multiple times, human DL researchers also figured much of it out in the 90′s—ie intelligence is easy—it just takes enormous amounts of compute for training. As long as you don’t shoot yourself in the foot—as long as your architectural prior is flexible enough (ex transformers), as long as your approx to bayesian inference actually converges correctly (normalization etc) etc—then the amount of intelligence you get is proportional to the net compute spent on training. The human brain isn’t exceptional—its just a scaled up primate brain, but scaling up the net training compute by 10x (3x from larger brain, 3x from extended neotany, and some from arch/hyperparm tweaking) was enough for linguistic intelligence and the concomitant turing transition to emerge[1]. EY hates the word emergence, but intelligence is an emergent phenomena.
The universal learning/scaling model was largely correct—as tested by openAI scaling up GPT to proto-AGI.
That does not mean we are on the final scaling curve. The brain is of course strong evidence of other scaling choices that look different than chinchilla scaling. A human brain’s natural ‘clock rate’ of about 100hz supports a thoughtspeed of about 10 tokens per second, or only about 10 billion tokens per lifetime training run. GPT3 trained on about 50x longer lifetime experience/data, and GPT4 may have trained on 1000 human lifetimes of experience/data. You can spend roughly the same compute budget training a huge brain sized model for just one human lifetime, or spend it on a 100x smaller model trained for far longer. You don’t end up in exactly the same space of course—GPT4 has far more crystallized knowledge than any one human, but seems to still lack much of a human domain expert’s fluid intelligence capabilities.
Moore room at the Bottom
If the brain really is ~6 OOM from thermodynamic efficiency limits, then we should not expect moore’s law to end with brains still having a non-trivial thermodynamic efficiency advantage over digital computers. Except that is exactly what is happening. TSMC is approaching the limits of circuit miniaturization, and it is increasing obvious that fully closing the (now not so large) gap with the brain will require more directly mimicking it through neuromorphic computing[2].
Biological cells operatedirectly at thermodynamic efficiency limits: they copy DNA using near minimal energy, and in general they perform robotics tasks of rearranging matter using near minimal energy. For nanotech replicators (and nanorobots in general) like biological cells thermodynamic efficiency is the dominant constraint, and biology is already pareto optimal there. No SI will ever create strong nanotech that significantly improves on the thermodynamic efficiency of biology—unless/until they can rewrite the laws of physics.
Of course an AGI could still kill much of humanity using advanced biotech weapons—ex a supervirus—but that is beyond the scope of EY’s specific model, and for various reasons mostly stemming from the strong prior that biology is super effecient I expect humanity to be very difficult to kill in this way (and growing harder to kill every year as we advance prosaic AI tech). Also killing humanity would likely not be in the best interests of even unaligned AGI, because humans will probably continue to be key components of the economy (as highly efficient general purpose robots) long after AGI running in datacenters takes most higher paying intellectual jobs. So instead I expect unaligned power-seeking AGIs to adopt much more covert strategies for world domination.
Any two AI designs might be less similar to each other than you are to a petunia.
Quintin Pope has already written out a well argued critique of this alien mindspace meme from the DL perspective, and I already criticized this meme once when it was fresh over a decade ago. So today I will instead take a somewhat different approach (an updated elaboration of my original critique).
Imagine we have some set of mysterious NNs, which we’d like to replicate, but we only have black box access. By that I mean we have many many examples of the likely partial inputs and outputs of these networks, and some ideas about the architecture, but we don’t have any direct access to the weights.
In turns out there is a simple and surprisingly successful technique which one can use to create an arbitrary partial emulation of any ensemble of NNs: distillation. In essence distillation is simply the process of training one NN on the collected inputs/outputs of other NNs, such that it learns to emulate them.
This is exactly how we train modern large ANNs, and LLMs specifically: by training them on the internet, we are training them on human thoughts and thus (partially) distilling human minds.
Thus my model (or the systems/cybernetic model in general) correctly predicted—well in advance—that LLMs would have anthropomorphic cognition: mirroring much or our seemingly idiosyncratic cognitive biases, quirks, and limitations. Thus we have AGI that can write poems and code (like humans) but struggles with multiplying numbers (like humans), generally exhibits human like psychology, is susceptible to flattery, priming, the Jungian “shadow self” effect, etc. There is a large growing pile of specific evidence that LLMs are distilling/simulating human minds, some of which I and others have collected in prior posts, but the strength of this argument should already clearly establish a strong prior expectation that distillation should be the default outcome.
The width of mindspace is completely irrelevant. Moravec, myself and the other systems-thinkers were correct: AI is and will be our mind children; the technosphere extends the noosphere.
This alone does not strongly entail that AGI will be aligned by default, but it does defeat EY’s argument that AGI will be unaligned by default (and he loses much bayes points that I gain).
The Risk Which Remains
To be clear, I am not arguing that AGI is not a threat. It is rather obviously the pivotal eschatonic event, the closing chapter in human history. Of course ‘it’ is dangerous, for we are dangerous. But that does not mean that 1.) extinction is the most likely outcome, or 2.) that alignment is intrinsically more difficult than AGI, or 3.) that EY’s specific arguments are the especially relevant and correct way to arrive at any such conclusions.
You will likely die, but probably not because of a nanotech holocaust initiated by a god-like machine superintelligence. Instead you will probably die when you simply can no longer afford the tech required to continue living. If AI does end up causing humanity’s extinction, it will probably be the result of a slow more prosaic process of gradually out-competing us economically. AGI is not inherently mortal and can afford patience, unlike us mere humans.
The turing transition: brains evolved linguistic symbolic communication which permits compressing and sharing thoughts across brains, forming a new layer of networked social computational organization and allowing minds to emerge as software entities. This is a one time transition, as there is nothing more universal/general than a turing machine.
Neuromorphic computing is the main eventual long-term threat to current GPUs/accelerators and a continuation of the trend of embedding efficient matrix ops into the hardware, but it is unlikely to completely replace them for various reasons, and I don’t expect it to be very viable until traditional moore’s law has mostly ended.
Contra Yudkowsky on AI Doom
Eliezer Yudkowsky predicts doom from AI: that humanity faces likely extinction in the near future (years or decades) from a rogue unaligned superintelligent AI system. Moreover he predicts that this is the default outcome, and AI alignment is so incredibly difficult that even he failed to solve it.
EY is an entertaining and skilled writer, but do not confuse rhetorical writing talent for depth and breadth of technical knowledge. I do not have EY’s talents there, or Scott Alexander’s poetic powers of prose. My skill points instead have gone near exclusively towards extensive study of neuroscience, deep learning, and graphics/GPU programming. More than most, I actually have the depth and breadth of technical knowledge necessary to evaluate these claims in detail.
I have evaluated this model in detail and found it substantially incorrect and in fact brazenly naively overconfident.
Intro
Even though the central prediction of the doom model is necessarily un-observable for anthropic reasons, alternative models (such as my own, or moravec’s, or hanson’s) have already made substantially better predictions, such that EY’s doom model has low posterior probability.
EY has espoused this doom model for over a decade, and hasn’t updated it much from what I can tell. Here is the classic doom model as I understand it, starting first with key background assumptions/claims:
Brain inefficiency: The human brain is inefficient in multiple dimensions/ways/metrics that translate into intelligence per dollar; inefficient as a hardware platform in key metrics such as thermodynamic efficiency.
Mind inefficiency or human incompetence: In terms of software he describes the brain as an inefficient complex “kludgy mess of spaghetti-code”. He derived these insights from the influential evolved modularity hypothesis as popularized in ev pysch by Tooby and Cosmides. He boo-hooed neural networks, and in fact actively bet against them in actions by hiring researchers trained in abstract math/philosophy, ignoring neuroscience and early DL, etc.
More room at the bottom: Naturally dovetailing with points 1 and 2, EY confidently predicts there is enormous room for further software and hardware improvement, the latter especially through strong drexlerian nanotech.
That Alien mindspace: EY claims human mindspace is an incredibly narrow twisty complex target to hit, whereas the space of AI mindspace is vast, and AI designs will be something like random rolls from this vast alien landscape resulting in an incredibly low probability of hitting the narrow human target.
Doom naturally follows from these assumptions: Sometime in the near future some team discovers the hidden keys of intelligence and creates a human-level AGI which then rewrites its own source code, initiating a self improvement recursion cascade which ultimately increases the AGI’s computational efficiency (intelligence/$, intelligence/J, etc) by many OOM to far surpass human brains, which then quickly results in the AGI developing strong nanotech and killing all humans within a matter of days or even hours.
If assumptions 1 and 2 don’t hold (relative to 3) then there is little to no room for recursive self improvement. If assumption 4 is completely wrong then the default outcome is not doom regardless.
Every one of his key assumptions is mostly wrong, as I and others predicted well in advance. EY seems to have been systematically overconfident as an early futurist, and then perhaps updated later to avoid specific predictions, but without much updating his mental models (specifically his nanotech-woo model, as we will see).
Brain Hardware Efficiency
EY correctly recognizes that thermodynamic efficiency is a key metric for computation/intelligence, and he confidently, brazenly claims (as of late 2021), that the brain is not that efficient and about 6 OOM from thermodynamic limits:
EY is just completely out of his depth here: he doesn’t seem to understand how the Landauer limit actually works, doesn’t seem to understand that synapses are analog MACs which minimally require OOMs more energy than simple binary switches, doesn’t seem to have a good model of the interconnect requirements, etc.
Some attempt to defend EY by invoking reversible computing, but EY explicitly states that ATP synthase may be close to 100% thermodynamically efficient, and explicitly links the end result of extreme inefficiency to the specific cause of pumping “thousands of ions in and out of each stretch of axon and dendrite”—which would be irrelevant when comparing to some exotic reversible superconducting or optical computer. Given that he doesn’t mention reversible computing and the hint “biology is simply not that efficient” helps establish we are both discussing conventional irreversible computation: not exotic reversible or quantum computing (neither of which are practical in the near future or relevant for the nanotech he envisions, which is fundamentally robotic and thus constrained by the efficiency of applying energy to irreversibly transform matter). He seems to believe biology is inefficient even given the practical constraints it is working with, not inefficient compared to all possible future hypothetical exotic computing platforms without consideration for other tradeoffs. Finally if he actually believes (as I do) that brains are efficient within the constraints of conventional irreversible computation, this would in fact substantially weaken his larger argument—and EY is not the kind of writer who weakens his own arguments.
In actuality biology is incredibly thermodynamically efficient, and generally seems to be near pareto-optimal in that regard at the cellular nanobot level, but we’ll get back to that.
In a 30 year human “training run” the brain uses somewhere between 1e23 to 1e25 flops. ANNs trained with this amount of compute already capture much—but not all—of human intelligence. One likely reason is that flops is not the only metric of relevance, and a human brain training run also uses 1e23 to 1e25 bytes of memops, which is still OOM more than the likely largest ANN training run to date (GPT4) - because GPUs have a 2 or 3 OOM gap between flops and memops.
My model instead predicts that AGI will require GPT4 -ish levels of training compute, and SI will require far more. To the extent that recursive self-improvement is actually a thing in the NN paradigm, it’s something that NNs mostly just do automatically (and something the brain currently still does better than ANNs).
Mind Software Efficiency
EY derived much of his negative beliefs about the human mind from the cognitive biases and ev psych literature, and especially Tooby and Cosmide’s influential evolved modularity hypothesis. The primary competitor to evolved modularity was/is the universal learning hypothesis and associated scaling hypothesis, and there was already sufficient evidence to rule out evolved modularity back in 2015 or earlier.
Let’s quickly assess the predictions of evolved modularity vs universal learning/scaling. Evolved modularity posits that the brain is a kludgy mess of domain specific evolved mechanisms (“spaghetti code” in EY’s words), and thus AGI will probably not come from brain reverse engineering. Human intelligence is exceptional because evolution figured out some “core to generality” that prior primate brains don’t have, but humans have only the minimal early version of this, and there is likely huge room for further improvement.
The universal learning/scaling model instead posits that there is a single obvious algorithmic signature for intelligence (approx bayesian inference), it isn’t that hard to figure out, evolution found it multiple times, human DL researchers also figured much of it out in the 90′s—ie intelligence is easy—it just takes enormous amounts of compute for training. As long as you don’t shoot yourself in the foot—as long as your architectural prior is flexible enough (ex transformers), as long as your approx to bayesian inference actually converges correctly (normalization etc) etc—then the amount of intelligence you get is proportional to the net compute spent on training. The human brain isn’t exceptional—its just a scaled up primate brain, but scaling up the net training compute by 10x (3x from larger brain, 3x from extended neotany, and some from arch/hyperparm tweaking) was enough for linguistic intelligence and the concomitant turing transition to emerge[1]. EY hates the word emergence, but intelligence is an emergent phenomena.
The universal learning/scaling model was largely correct—as tested by openAI scaling up GPT to proto-AGI.
That does not mean we are on the final scaling curve. The brain is of course strong evidence of other scaling choices that look different than chinchilla scaling. A human brain’s natural ‘clock rate’ of about 100hz supports a thoughtspeed of about 10 tokens per second, or only about 10 billion tokens per lifetime training run. GPT3 trained on about 50x longer lifetime experience/data, and GPT4 may have trained on 1000 human lifetimes of experience/data. You can spend roughly the same compute budget training a huge brain sized model for just one human lifetime, or spend it on a 100x smaller model trained for far longer. You don’t end up in exactly the same space of course—GPT4 has far more crystallized knowledge than any one human, but seems to still lack much of a human domain expert’s fluid intelligence capabilities.
Moore room at the Bottom
If the brain really is ~6 OOM from thermodynamic efficiency limits, then we should not expect moore’s law to end with brains still having a non-trivial thermodynamic efficiency advantage over digital computers. Except that is exactly what is happening. TSMC is approaching the limits of circuit miniaturization, and it is increasing obvious that fully closing the (now not so large) gap with the brain will require more directly mimicking it through neuromorphic computing[2].
Biological cells operate directly at thermodynamic efficiency limits: they copy DNA using near minimal energy, and in general they perform robotics tasks of rearranging matter using near minimal energy. For nanotech replicators (and nanorobots in general) like biological cells thermodynamic efficiency is the dominant constraint, and biology is already pareto optimal there. No SI will ever create strong nanotech that significantly improves on the thermodynamic efficiency of biology—unless/until they can rewrite the laws of physics.
Of course an AGI could still kill much of humanity using advanced biotech weapons—ex a supervirus—but that is beyond the scope of EY’s specific model, and for various reasons mostly stemming from the strong prior that biology is super effecient I expect humanity to be very difficult to kill in this way (and growing harder to kill every year as we advance prosaic AI tech). Also killing humanity would likely not be in the best interests of even unaligned AGI, because humans will probably continue to be key components of the economy (as highly efficient general purpose robots) long after AGI running in datacenters takes most higher paying intellectual jobs. So instead I expect unaligned power-seeking AGIs to adopt much more covert strategies for world domination.
That Alien Mindspace
In the “design space of minds in general” EY says:
Quintin Pope has already written out a well argued critique of this alien mindspace meme from the DL perspective, and I already criticized this meme once when it was fresh over a decade ago. So today I will instead take a somewhat different approach (an updated elaboration of my original critique).
Imagine we have some set of mysterious NNs, which we’d like to replicate, but we only have black box access. By that I mean we have many many examples of the likely partial inputs and outputs of these networks, and some ideas about the architecture, but we don’t have any direct access to the weights.
In turns out there is a simple and surprisingly successful technique which one can use to create an arbitrary partial emulation of any ensemble of NNs: distillation. In essence distillation is simply the process of training one NN on the collected inputs/outputs of other NNs, such that it learns to emulate them.
This is exactly how we train modern large ANNs, and LLMs specifically: by training them on the internet, we are training them on human thoughts and thus (partially) distilling human minds.
Thus my model (or the systems/cybernetic model in general) correctly predicted—well in advance—that LLMs would have anthropomorphic cognition: mirroring much or our seemingly idiosyncratic cognitive biases, quirks, and limitations. Thus we have AGI that can write poems and code (like humans) but struggles with multiplying numbers (like humans), generally exhibits human like psychology, is susceptible to flattery, priming, the Jungian “shadow self” effect, etc. There is a large growing pile of specific evidence that LLMs are distilling/simulating human minds, some of which I and others have collected in prior posts, but the strength of this argument should already clearly establish a strong prior expectation that distillation should be the default outcome.
The width of mindspace is completely irrelevant. Moravec, myself and the other systems-thinkers were correct: AI is and will be our mind children; the technosphere extends the noosphere.
This alone does not strongly entail that AGI will be aligned by default, but it does defeat EY’s argument that AGI will be unaligned by default (and he loses much bayes points that I gain).
The Risk Which Remains
To be clear, I am not arguing that AGI is not a threat. It is rather obviously the pivotal eschatonic event, the closing chapter in human history. Of course ‘it’ is dangerous, for we are dangerous. But that does not mean that 1.) extinction is the most likely outcome, or 2.) that alignment is intrinsically more difficult than AGI, or 3.) that EY’s specific arguments are the especially relevant and correct way to arrive at any such conclusions.
You will likely die, but probably not because of a nanotech holocaust initiated by a god-like machine superintelligence. Instead you will probably die when you simply can no longer afford the tech required to continue living. If AI does end up causing humanity’s extinction, it will probably be the result of a slow more prosaic process of gradually out-competing us economically. AGI is not inherently mortal and can afford patience, unlike us mere humans.
The turing transition: brains evolved linguistic symbolic communication which permits compressing and sharing thoughts across brains, forming a new layer of networked social computational organization and allowing minds to emerge as software entities. This is a one time transition, as there is nothing more universal/general than a turing machine.
Neuromorphic computing is the main eventual long-term threat to current GPUs/accelerators and a continuation of the trend of embedding efficient matrix ops into the hardware, but it is unlikely to completely replace them for various reasons, and I don’t expect it to be very viable until traditional moore’s law has mostly ended.