Fast takeoff traditionally implies time from AGI to singularity measured in hours or days, which you just don’t get with merely mundane improvements like copying or mild algorithmic advances. EY (and perhaps Bostrom to some extent) anticipated fast takeoff explicitly enabled by many OOM brain inefficiency, such that the equivalent of many decades of Moore’s Law could be compressed into mere moments. The key rate limiter in these scenarios ends up being the ability to physically move raw materials through complex supply chains processes to produce more computing substrate, which is bypassed through the use of hard drexlerian nanotech.
But it turns out that biology is already near optimal-ish (cells in particular already are essentially optimal nanoscale robots; thus drexlerian nanotech is probably a pipe dream), so that just isn’t the world we live in.
What sort of general beliefs does this concrete scenario of “hard takeoff ” imply about returns on cognitive reinvestment? It supposes that:
An AI can get major gains rather than minor gains by doing better computer science than its human inventors.
More generally, it’s being supposed that an AI can achieve large gains through better use of computing power it already has, or using only processing power it can rent or otherwise obtain on short timescales—in particular, without setting up new chip factories or doing anything else which would involve a long, unavoidable delay.
An AI can continue reinvesting these gains until it has a huge cognitive problem-solving advantage over humans.
… so Yudkowsky’s picture of hard takeoff explicitly does not route through inefficiency in the brain’s compute hardware, it routes through inefficiency in algorithms. He’s expecting the Drexlerian nanotech to come at the end of the hard takeoff; nanotech is not the main mechanism by which hard takeoff is enabled.
The core idea of hard takeoff is that algorithmic advances can get to superintelligence without needing to build lots of new hardware. Your brain efficiency post doesn’t particularly argue against that.
That quote seemed to disagree so much with my model of early EY that I had to go back and re-read it. And I now genuinely think my earlier summary is still quite accurate.
[pg 29]:
Some sort of AI project run by a hedge fund, academia, Google,37 or a government,
advances to a sufficiently developed level (see section 3.10) that it starts a string of selfimprovements that is sustained and does not level off. This cascade of self-improvements might start due to a basic breakthrough by the researchers which enables the AI to understand and redesign more of its own cognitive algorithms ..
Once this AI started on a sustained path of intelligence explosion, there would follow
some period of time while the AI was actively self-improving, and perhaps obtaining
additional resources, but hadn’t yet reached a cognitive level worthy of being called
“superintelligence.” This time period might be months or years,[^38] or days or seconds.[^39]
At some point the AI would reach the point where it could solve the protein structure
prediction problem and build nanotechnology—or figure out how to control atomic force microscopes to create new tool tips that could be used to build small nanostructures
which could build more nanostructures—or perhaps follow some smarter and faster
route to rapid infrastructure. An AI that goes past this point can be considered to
have reached a threshold of great material capability. From this would probably follow
cognitive superintelligence (if not already present); vast computing resources could be
quickly accessed to further scale cognitive algorithms.
Notice I said “Fast takeoff traditionally implies time from AGI to singularity measured in hours or days, which you just don’t get with merely mundane improvements like copying or mild algorithmic advances.”—Which doesn’t disagree with anything here, as I was talking about time from AGI to singularity, and regardless EY indicates rapid takeoff to superintelligence probably requires drexlerian nanotech.
AGI → Superintelligence → Singularity
Also EY clearly sees nanotech as the faster replacement for slow chip foundry cycles, as I summarized:
. Given a choice of investments, a rational agency will choose the investment with the highest interest rate—the greatest multiplicative factor per unit time. In a context where gains can be repeatedly reinvested, an investment that returns 100-fold in one year is vastly inferior to an investment which returns 1.001-fold in one hour. At some point an AI’s internal code changes will hit a ceiling, but there’s a huge incentive to climb toward, e.g., the protein-structure-prediction threshold by improving code rather than by building chip factories
Without drexlerian nanotech to smash through the code-efficiency ceiling the only alternative is the slower chip foundry route, which of course is also largely stalled if brains are efficient and already equivalent to end moore’s law tech.
Regardless, in the brain efficiency post I also argue against many OOM brain software efficiency. (If anything, the brain’s incredible data efficiency is increasingly looking like a difficult barrier for AGI)
I think there’s some inconsistent usage of “superintelligence” here. IIRC Yudkowsky also mentioned somewhere that he doesn’t expect humans to be able to build nanotech any time soon without AGI, therefore presumably he expects the AGI needs to be very superhuman to build nanotech. His fast takeoff scenario therefore involves the AGI reaching very superhuman levels before it starts to invest in manufacturing. But he’s using the term “superintelligence” for something quite a bit more powerful than just “very superhuman”.
For strategic purposes, it’s the weaker version (i.e. “very superhuman”) which is mostly relevant.
Regardless, in the brain efficiency post I also argue against many OOM brain software efficiency.
You argued that current DL systems are mostly less data-efficient than the brain (and at best about the same). That is extremely weak evidence that nothing more data-efficient than the brain exists. And you didn’t argue at all about any other dimensions of reasoning algorithms—e.g. search efficiency, ability to transport information/models to new domains, model expressiveness, efficiency of plans, coordination/communication, metastuff, etc.
I think you are missing the forest of my argument for it’s trees. The default hypothesis—the one that requires evidence to update against—is now that the brain is efficient in most respects, rather than the converse.
The larger update is that evolution is both fast and efficient. It didn’t proceed through some slow analog of moore’s law where some initial terribly inefficient designs are slowly improved. Biological evolution developed near-optimal nanotech quickly, and then slowly built up larger structure. It moved slowly only because it was never optimizing for intelligence at all, not because it is inherently slow and inefficient. But intelligence is often useful so eventually it developed near-optimal designs for various general learning machines—not in humans—but in much earlier brains.
Human brains are simply standard primate brains, scaled up, with a few tweaks for language. The phase transition around human intelligence is entirely due to language adding another layer of systemic organization (like the multicellular transition); due to culture allowing us to learn from all past human experiences, so our (compressed) training dataset scales with our exponentially growing population vs being essentially constant as for animals.
Deep learning is simply reverse engineering the brain (directly and indirectly), and this was always ever the only viable path to AGI [1]. Based on the large amount of evidence we have from DL and neuroscience it’s fairly clear (to me at least) that the the brain is also probably near optimal in data efficiency (in predictive gain per bit of sensor data per unit of compute—not to be confused with sample efficiency which you can always improve at the cost of more compute).
Of course AGI will have advantages (mostly in expanding beyond the limitations of human lifetimes and associated brain sizes and slow interconnect); but overall it’s more like the beginning of a cambrian explosion that is a natural continuation of brain biological evolution, rather than some alien invasion.
The default hypothesis—the one that requires evidence to update against—is now that the brain is efficient in most respects, rather than the converse.
I think you have basically not made that case, certainly not to such a degree that people who previously believed the opposite will be convinced. You explored a few specific dimensions—like energy use, heat dissipation, circuit depth. But these are all things which we’d expect to have been under lots of evolutionary pressure for a long time. They’re also all relatively “low-level” things, in the sense that we wouldn’t expect to need unusually intricate genetic machinery to fine-tune them; we’d expect all those dimensions to be relatively accessible to evolutionary exploration.
If you want to make the case of that brain efficiency is the default hypothesis, then you need to argue it in cases where the relevant capabilities weren’t obviously under lots of selection pressure for a long time (e.g. recently acquired capabilities like language), or where someone might expect architectural complexity to be too great for the genetic information bottleneck. You need to address at least some “hard” cases for brain efficiency, not just “easy” cases.
Or, another angle: I’d expect that, by all of the efficiency measures in your brain efficiency post, a rat brain also looks near-optimal. Therefore, by anology to your argument, we should conclude that it is not possible for some new biological organism to undergo a “hard takeoff” (relative to evolutionary timescales) in intelligent reasoning capabilities. Where does that argument fail? What inefficiency in the rat brain did humanity improve on? If it was language, why do expect that the apparently-all-important language capability is near-optimal in humans? Also, why do we expect there won’t be some other all-important capability, just like language was a new super-important capability in the rat → human transition?
In a nutshell EY/LW folks got much of their brain model from the heuristics and biases, ev psych literature which is based on the evolved modularity hypothesis, which turned out to be near completely wrong. So just by merely reading the sequences and associated lit LW folks have unfortunately picked up a fairly inaccurate default view of the brain.
In a nutshell the brain is a very generic/universal learning system built mostly out of a few different complimentary types of neural computronium (cortex, cerebellum, etc) and an actual practical recursive self improvement learning system that rapidly learns efficient circuit architecture from lifetime experience. The general meta-architecture is not specific to humans, primates, or even mammals, and in fact is highly convergent and conserved—evolution found and preserved it again and again across wildly divergent lineages. So there isn’t so much room for improvement in architecture, most of the improvement comes solely from scaling.
Nonetheless there are important differences across the lineages: primates along with some birds and perhaps some octopoda have the most scaling efficient archs in terms of neuron/synapse density, but these differences are most likely due to diverging optimization pressures along a pareto efficiency frontier.
The difference in brain capabilities are then mostly just scaling differences: human brains are just 4x scaled up primate brains, having nearly zero detectable divergences from the core primate architecture (brain size is not a static feature of arch, the arch also defines a scaling plan, so you can think of size as being a tunable hyperparam with many downstream modifications to the wiring prior). Rodent brain arch has probably the worst scaling plan, probably they are optimized for speed and rarely grew large.
Thanks.
Fast takeoff traditionally implies time from AGI to singularity measured in hours or days, which you just don’t get with merely mundane improvements like copying or mild algorithmic advances. EY (and perhaps Bostrom to some extent) anticipated fast takeoff explicitly enabled by many OOM brain inefficiency, such that the equivalent of many decades of Moore’s Law could be compressed into mere moments. The key rate limiter in these scenarios ends up being the ability to physically move raw materials through complex supply chains processes to produce more computing substrate, which is bypassed through the use of hard drexlerian nanotech.
But it turns out that biology is already near optimal-ish (cells in particular already are essentially optimal nanoscale robots; thus drexlerian nanotech is probably a pipe dream), so that just isn’t the world we live in.
Quoting Yudkowsky’s Intelligence Explosions Microeconomics, page 30:
… so Yudkowsky’s picture of hard takeoff explicitly does not route through inefficiency in the brain’s compute hardware, it routes through inefficiency in algorithms. He’s expecting the Drexlerian nanotech to come at the end of the hard takeoff; nanotech is not the main mechanism by which hard takeoff is enabled.
The core idea of hard takeoff is that algorithmic advances can get to superintelligence without needing to build lots of new hardware. Your brain efficiency post doesn’t particularly argue against that.
That quote seemed to disagree so much with my model of early EY that I had to go back and re-read it. And I now genuinely think my earlier summary is still quite accurate.
[pg 29]:
Notice I said “Fast takeoff traditionally implies time from AGI to singularity measured in hours or days, which you just don’t get with merely mundane improvements like copying or mild algorithmic advances.”—Which doesn’t disagree with anything here, as I was talking about time from AGI to singularity, and regardless EY indicates rapid takeoff to superintelligence probably requires drexlerian nanotech.
AGI → Superintelligence → Singularity
Also EY clearly sees nanotech as the faster replacement for slow chip foundry cycles, as I summarized:
Without drexlerian nanotech to smash through the code-efficiency ceiling the only alternative is the slower chip foundry route, which of course is also largely stalled if brains are efficient and already equivalent to end moore’s law tech.
Regardless, in the brain efficiency post I also argue against many OOM brain software efficiency. (If anything, the brain’s incredible data efficiency is increasingly looking like a difficult barrier for AGI)
I think there’s some inconsistent usage of “superintelligence” here. IIRC Yudkowsky also mentioned somewhere that he doesn’t expect humans to be able to build nanotech any time soon without AGI, therefore presumably he expects the AGI needs to be very superhuman to build nanotech. His fast takeoff scenario therefore involves the AGI reaching very superhuman levels before it starts to invest in manufacturing. But he’s using the term “superintelligence” for something quite a bit more powerful than just “very superhuman”.
For strategic purposes, it’s the weaker version (i.e. “very superhuman”) which is mostly relevant.
You argued that current DL systems are mostly less data-efficient than the brain (and at best about the same). That is extremely weak evidence that nothing more data-efficient than the brain exists. And you didn’t argue at all about any other dimensions of reasoning algorithms—e.g. search efficiency, ability to transport information/models to new domains, model expressiveness, efficiency of plans, coordination/communication, metastuff, etc.
I think you are missing the forest of my argument for it’s trees. The default hypothesis—the one that requires evidence to update against—is now that the brain is efficient in most respects, rather than the converse.
The larger update is that evolution is both fast and efficient. It didn’t proceed through some slow analog of moore’s law where some initial terribly inefficient designs are slowly improved. Biological evolution developed near-optimal nanotech quickly, and then slowly built up larger structure. It moved slowly only because it was never optimizing for intelligence at all, not because it is inherently slow and inefficient. But intelligence is often useful so eventually it developed near-optimal designs for various general learning machines—not in humans—but in much earlier brains.
Human brains are simply standard primate brains, scaled up, with a few tweaks for language. The phase transition around human intelligence is entirely due to language adding another layer of systemic organization (like the multicellular transition); due to culture allowing us to learn from all past human experiences, so our (compressed) training dataset scales with our exponentially growing population vs being essentially constant as for animals.
Deep learning is simply reverse engineering the brain (directly and indirectly), and this was always ever the only viable path to AGI [1]. Based on the large amount of evidence we have from DL and neuroscience it’s fairly clear (to me at least) that the the brain is also probably near optimal in data efficiency (in predictive gain per bit of sensor data per unit of compute—not to be confused with sample efficiency which you can always improve at the cost of more compute).
Of course AGI will have advantages (mostly in expanding beyond the limitations of human lifetimes and associated brain sizes and slow interconnect); but overall it’s more like the beginning of a cambrian explosion that is a natural continuation of brain biological evolution, rather than some alien invasion.
At this point we have actually heavily explored the landscape of bayesian learning algorithms and huge surprises are unlikely.
I think you have basically not made that case, certainly not to such a degree that people who previously believed the opposite will be convinced. You explored a few specific dimensions—like energy use, heat dissipation, circuit depth. But these are all things which we’d expect to have been under lots of evolutionary pressure for a long time. They’re also all relatively “low-level” things, in the sense that we wouldn’t expect to need unusually intricate genetic machinery to fine-tune them; we’d expect all those dimensions to be relatively accessible to evolutionary exploration.
If you want to make the case of that brain efficiency is the default hypothesis, then you need to argue it in cases where the relevant capabilities weren’t obviously under lots of selection pressure for a long time (e.g. recently acquired capabilities like language), or where someone might expect architectural complexity to be too great for the genetic information bottleneck. You need to address at least some “hard” cases for brain efficiency, not just “easy” cases.
Or, another angle: I’d expect that, by all of the efficiency measures in your brain efficiency post, a rat brain also looks near-optimal. Therefore, by anology to your argument, we should conclude that it is not possible for some new biological organism to undergo a “hard takeoff” (relative to evolutionary timescales) in intelligent reasoning capabilities. Where does that argument fail? What inefficiency in the rat brain did humanity improve on? If it was language, why do expect that the apparently-all-important language capability is near-optimal in humans? Also, why do we expect there won’t be some other all-important capability, just like language was a new super-important capability in the rat → human transition?
I already made much of the brain architecture/algorithms argument in an earlier post: “The Brain as a Universal Learning Machine”.
In a nutshell EY/LW folks got much of their brain model from the heuristics and biases, ev psych literature which is based on the evolved modularity hypothesis, which turned out to be near completely wrong. So just by merely reading the sequences and associated lit LW folks have unfortunately picked up a fairly inaccurate default view of the brain.
In a nutshell the brain is a very generic/universal learning system built mostly out of a few different complimentary types of neural computronium (cortex, cerebellum, etc) and an actual practical recursive self improvement learning system that rapidly learns efficient circuit architecture from lifetime experience. The general meta-architecture is not specific to humans, primates, or even mammals, and in fact is highly convergent and conserved—evolution found and preserved it again and again across wildly divergent lineages. So there isn’t so much room for improvement in architecture, most of the improvement comes solely from scaling.
Nonetheless there are important differences across the lineages: primates along with some birds and perhaps some octopoda have the most scaling efficient archs in terms of neuron/synapse density, but these differences are most likely due to diverging optimization pressures along a pareto efficiency frontier.
The difference in brain capabilities are then mostly just scaling differences: human brains are just 4x scaled up primate brains, having nearly zero detectable divergences from the core primate architecture (brain size is not a static feature of arch, the arch also defines a scaling plan, so you can think of size as being a tunable hyperparam with many downstream modifications to the wiring prior). Rodent brain arch has probably the worst scaling plan, probably they are optimized for speed and rarely grew large.
I think Yudkowsky used to expect improvements in the low-level compute too, e.g. “Still, pumping the ions back out does not sound very adiabatic to me?”.