But current and future computers will have much, much larger energy budgets. They can therefore operate much faster (and they do).
Faster clock speed but not faster thought speed., as they just burn all that speed inefficiently simulating a large circuit. Even though a single GPU has similar nominal flops compared to the brain and uses 30x more power, they have about 3 OOM less memory and memory bandwidth. GPUs are amazing at simulating insect brains at high speeds.
But we want big brain-scale ANNs, as that is what intelligence requires. So you need 1000x GPUs in parallel with complex expensive high bandwidth interconnect to get a single brain-size ANN, at which point you also get 1000 instances (of the same mind). That only allows you to run it at brain speed, not any faster. You can’t then just run it on 1 million GPUs to get 1000x speedup—that’s not how it works at all. Instead you’d get 1 million instances of 1000 brain size ANNs. This ultimately relate to energy flow efficiency—see the section on circuits. Energy efficiency is a complex multi-dimensional engineering constraint set, it’s not a simple linear economic multiplier.
Moore’s Law isn’t going to improve this scenario much—at least not for GPUs or any von neumman style architecture.
Moore’s Law will eventually allow a very specific narrow class of designs to simultaneously achieve brain scale and high speedup, but that narrow class of designs is necessarily neuromorphic and similar to an artificial brain. Furthermore, economic pressure will naturally push the industry towards neuromorphic brain style AGI designs, as they will massively outcompete everything else.
These are the engineering constraints from physics the article is attempting to elucidate.
If they are using high energy budgets, they don’t need to build chips to be more and more like the human brain, which has a low energy budget. In other words, they don’t need this:
Given the choice between a neuromorphic design which can run 1,000 instances of 1,000 unique agent minds at 100x the speed of human thought, or a von-neumman type design which can run 1,000 instances of only 1 agent mind at 1x the speed of human thought at the same prices—the latter is not competitive.
AGI is not defined as hardware that performs computations as energy-efficiently as the brain. Instead, it is software that performs all important intellectual tasks as effectively as the brain, cost be damned.
The cost/value of a human worker is like 0.1% energy equivalent, and mostly intangibles with a significant chunk being knowledge/software. AGI is only economically viable if it outcompetes humans, so that right there implies an energy constraint that it can’t be 10000x less energy efficient. This constraint is naturally much more stringent for robotic applications.
Then of course the same principles apply when comparing neuromorphic vs von-neumann machines at the end of Moore’s Law. The former is fundamentally multiple OOM more energy efficient than the latter (and just or more circuit cost efficient), and thus can run multiple OOM faster at the same cost, so it obviously wins.
One possibility is that I’m misinterpreting your conclusion about how the future of AGI is to become more like the brain, not less. I interpreted that to mean that you were forecasting a rise in neuromorphic computing and/or forecasting that the biggest progress in AGI will come from people studying neuroscience to learn from the brain, and (given what you said about brain parity in the introduction) that you don’t think we’ll get AGI until we do those things and make it more like the brain. Do you think those things, or anything adjacent?
Early AGI is somewhat brain-like ANNS running on GPUs, later AGI is even more brain-like ANNs running on neuromorphic/PIM hardware. Hmm maybe I need to make those parts more clear?
I of course agree that if we had the right knowledge, we could build AGI today and it would probably even run on my laptop.
The article shows how this is probably impossible, just like it would be impossible for you to run the full Google search engine on your 2021 laptop.
And realistically I think we are more likely to get AGI by scaling up models and running them on massive supercomputers (for much more energy cost than the human brain uses!) than by achieving great new insights of knowledge such that we can run 1000 AGI on 1000 2021 GPUs.
Lol what do you think a modern supercomputer is, if not thousands of GPUs? There are scaling limits to parallelization, as mentioned. Or perhaps you are confused by the 1000 instance thing, but as I tried to explain: a single AGI instance is just as expensive as ~1000, at least on current non-neurmorphic hardware. (So you always get 1000-ish instances, see the circuit section)
I get the sense that we are talking past each other. I wonder if part of what’s happening here is that you have a broader notion of what counts as neuromorphic hardware and brain-like AI than I did, and are therefore making a much weaker claim than I thought you were. I can’t tell for sure but some of the things you’ve said recently make me think this.
I know that modern supercomputers are thousands of GPUs. That isn’t in conflict with what I said. I understand that on current hardware anyone able to make 1 AGI will be able to easily make many, for the reasons you mentioned.
I’m not sure what you meant by the claims I objected to, so I’ll stop trying to argue against them. I do still stand by what I said about how energy is not currently a taut constraint, and your post sure did give the impression that you thought it was. Or maybe you were just saying you think it will eventually become one?
I provided some links to neuromorphic hardware research, and I sometimes lump it in with PIM (Processor in Memory) architecture. It’s an architecture where memory and compute are unified with some artificial synapse like thing—eg memristors. It’s necessarily brain-like, as the thing it’s really good at it is running (low precision) ANNs efficiently.
The end of Moore’s Law is actually a series of incomplete barriers, each of which only allows an increasingly narrower computational design to scale past that barrier: dennard scaling blocked serial architectures (CPUs), next up the end of energy scaling will block von-neumman arch (GPUs/TPUs), allowing only neuromorphic/PIM to scale much further, then there is the final size scaling barrier for all reversible computation, and only exotic reversible/quantum computers scale past that.
Your comment about your laptop running AGI suggested you had a different model for the min hardware requirements in terms of RAM, RAM bandwidth, and flops.
The end of Moore’s Law is actually a series of incomplete barriers, each of which only allows an increasingly narrower computational design to scale past that barrier: dennard scaling blocked serial architectures (CPUs), next up the end of energy scaling will block von-neumman arch (GPUs/TPUs), allowing only neuromorphic/PIM to scale much further, then there is the final size scaling barrier for all reversible computation, and only exotic reversible/quantum computers scale past that.
Funnily enough, if this paragraph had appeared in the original text by way of explanation for what you meant by “The future of AGI is to become more like the brain, not less.” then I would not have objected. Sorry for the misunderstanding. I do still think we have some sort of disagreement about takeoff and timelines modelling, but maybe we don’t.
I shouldn’t have said laptop; I should have said whatever it was you said (GPUs etc.) I happen to also believe it could in principle be done on a laptop with the right knowledge (imagine God himself wrote the code) but I shouldn’t have opened that can of worms. I agree that for all practical purposes it may as well be impossible.
If I had to guess at the crux between your disagreement on timelines, I think you might disagree about the FOOM process itself, but not about energy as a taut constraint to the first human-level AGI (which you both seem to agree isn’t the case). Per Jacob’s model, if a FOOM requires the AGI to quickly become much much smarter than humans, that excess smartness will inherently come with a massive electrical cost, which will cap it out at O(10^9) human-brain-equivalents until it can substantially increase world energy output. This would serve to arrest FOOM at roughly human-civilization-scale collective intelligence, except with much better coordination abilities.
To me, this was a pretty significant update, as I was previously imagining FOOMing to not top out before it was way way past human civilization’s collective bio-compute.
What do you mean by “arrest FOOM?” I am quite confident that by the time the intelligence explosion starts winding down, it’ll be past the point of no return for humans. Maybe from the AIs perspective progress will have stagnated due to compute constraints, and further progress will happen only once they can design exotic new hardware, so subjectively it feels like aeons of stagnation. But I think that “human-civilization-scale collective intelligence except with much better coordination abilities” is vastly underselling it. It’s like saying caveman humans were roughly elephant-scale intelligences except with better coordination abilities. Or saying that SpaceX is “roughly equivalent to the average US high school 9th grade class, except with better coordination abilities.” Do you disagree with this?
I’m not entirely sure what you mean by “better coordination abilities”, but the primary difference between 9th graders and SpaceX employees is knowledge/training. The primary difference between elephants and caveman humans was the latter possessing language and thus technology/culture and beyond single-lifetime knowledge accumulation.
AGI instances of the same shared mind/model should obviously have a coordination advantage, as should those created by the same organization, but there are many organizations that may be creating AGI.
Even in a world where AGI is running on GPUs and is scale-out bound by energy use and fab output, it may be that a smaller number of larger-than-human minds trained on beyond-human experience have a strong advantage, and in general I’d expect those types of advantages to matter at least as much as ‘coordination abilities’.
I don’t have a precise definition in mind since I was parrotting Yonadav. My point was that SpaceX is way better than a random similarly-sized group of high schoolers in many many important ways, even though SpaceX consumes just as many calories/energy as the high schoolers, such that it’s totally misleading to describe them as “roughly equivalent except that SpaceX has a massive coordination advantage.” The only thing roughly equivalent about them is their energy consumption, which just goes to show energy consumption is not a useful metric here.
I totally agree that fewer, larger brains with experience advantages seem likely to outcompete many merely human-sized brains. In fact I think I agree with everything you said in this comment.
No, I agree that coordination is the ballgame, and there’s not huge practical difference there.
Entirely separately, if we were worried about a treacherous turn due to a system being way above our capabilities, this lowers the probability of that, because there are clearer signals associated with the increase in intelligence needed to out-scheme a team of careful humans. (Large compute + energy usage.) It’s not close to being a solution, but it does bound the tail of arbitrarily-pessimistic outcomes from small-scale projects suddenly FOOMing. It also introduces an additional moniterable real world effect (a spike in energy usage noticeable by energy regulation systems).
Ah, I see. I think 10^9 is not a meaningful number to be talking about; long before there are 10^9 brain-equivalents worth of compute going into AI, we’ll be past the point of no return. But if instead you are talking about an amount of compute large enough that energy companies should be able to detect it, then yeah this seems fairly plausible. Supercomputers can’t be hidden from energy companies as far as I know, and plausibly AGI will appear first in supercomputers, so plausibly wherever AGI appears, it’ll be known by some government that the project was underweigh at least.
I don’t think this meaningfully lowers the probability of treacherous turn due to a system being way above our capabilities though. That’s because I didn’t put much probability mass on secret-AGI-project-in-a-basement scenarios anyway. I guess if I had, then this would have updated me.
Crypto mining would be affected significantly as well, or potentially mostly instead of, total energy use: intelligence is valuable-computation-per-watt, changing v-c-p-w changes the valuable energy spend of computers that sit idle, so you’d expect projects bidding on this to overtake cryptocurrency mining as the best use of idle computers, whether that’s due to a single project buying up computers and power, or due to a cryptocurrency energy-wasting-farm suddenly finding something directly valuable to do with their machines (and in fact it is already the case that ML can pay more than crypto mining).
@jacob_cannell’s argument is simply that the brain has more to tell us about the structure of high-value-per-watt computation than expected by ai philosophers. It does not mean the brain is at the absolute limit of generalized algorithmic energy efficiency (aka the only possible generalized intelligence metric); it only means that the structure of physical limits on algorithmic energy efficiency must be obeyed by any intelligent system, and while there may be large asymptotic speedups from larger scale structure improvement, the local efficiency of the brain is nothing to shake a stick at.
Perhaps ASI could be done earlier by “wasting” energy on lower value-per-watt AI projects—and in fact, there’s no reason to believe otherwise from available research progress. All AI progress that has ever occurred, after all, has been on lower generalized value-per-watt compute substrate than human brains can provide, but in return for being on thermodynamically inefficiency computers, it gets benefits that can economically compete with humans—eg via algorithmic specialization, high precision math, or exact repeatability—and thereby, AI research makes progress towards ever-increasing value-of-compute-output-per-watt.
If a system is AGI, it means that it is within a constant factor of energy efficiency per watt of the human brain for nearly all tasks—potentially a large constant factor, but a constant factor nonetheless. If it’s just barely general superintelligence and is wildly inefficient at small scales, then the only possible way it could be superintelligence is because it scales (maybe just barely) better than the brain with problem difficulty—extracting asymptotically better value-per-watt than an equivalently scaled system of humans consuming the same number of watts, due to what must ground out to improved total-system-thermodynamic-efficiency-per-unit-useful-computation.
Your proposal seems to be that we should expect a large scale multi-agent AI system to be superintelligence in this larger-scale asymptotic respect, despite that the human brain has shockingly high interconnect-efficiency and basic thermal compute efficiency. I have no disagreement. What this does tell us is that deep learning doesn’t have a unique expected qualitative advantage nor expected qualitative disadvantage vs the brain. if it becomes able to find more energy-efficient energy routes through its processing substrate’s spacetime (ie more energy efficient algorithms) (ie more intelligent algorithms), then it wins. predicting when that will happen, which teams are close, and guaranteeing safety becomes the remaining issue: guaranteeing that the resulting system does not cause mass energy-structure-aka-data loss (eg, death, body damage, injury, memory loss, hdd corruption/erasure, failure to cryonically freeze as-yet-unrepairable beings, etc) nor interfere significantly with the values of living beings (torture, energy-budget squeeze, cryonic freezing of beings who wish to continue operating, etc).
(due to the cycles seen in evolutionary game theory, I suspect that an unsafe or bad-at-distributed-systems-fairness AGI mega-network will moderately quickly collapse with similar high-defection-rate issues to the human society we have; and if it exterminates and then succeeds humanity, I’d guess it will eventually evolve a large scale cooperative system again; but there’s no reason to believe it wouldn’t kill us first. friendly multi-agent systems are the hardest part of this whole thing, IMO.)
Faster clock speed but not faster thought speed., as they just burn all that speed inefficiently simulating a large circuit. Even though a single GPU has similar nominal flops compared to the brain and uses 30x more power, they have about 3 OOM less memory and memory bandwidth. GPUs are amazing at simulating insect brains at high speeds.
But we want big brain-scale ANNs, as that is what intelligence requires. So you need 1000x GPUs in parallel with complex expensive high bandwidth interconnect to get a single brain-size ANN, at which point you also get 1000 instances (of the same mind). That only allows you to run it at brain speed, not any faster. You can’t then just run it on 1 million GPUs to get 1000x speedup—that’s not how it works at all. Instead you’d get 1 million instances of 1000 brain size ANNs. This ultimately relate to energy flow efficiency—see the section on circuits. Energy efficiency is a complex multi-dimensional engineering constraint set, it’s not a simple linear economic multiplier.
Moore’s Law isn’t going to improve this scenario much—at least not for GPUs or any von neumman style architecture.
Moore’s Law will eventually allow a very specific narrow class of designs to simultaneously achieve brain scale and high speedup, but that narrow class of designs is necessarily neuromorphic and similar to an artificial brain. Furthermore, economic pressure will naturally push the industry towards neuromorphic brain style AGI designs, as they will massively outcompete everything else.
These are the engineering constraints from physics the article is attempting to elucidate.
Given the choice between a neuromorphic design which can run 1,000 instances of 1,000 unique agent minds at 100x the speed of human thought, or a von-neumman type design which can run 1,000 instances of only 1 agent mind at 1x the speed of human thought at the same prices—the latter is not competitive.
The cost/value of a human worker is like 0.1% energy equivalent, and mostly intangibles with a significant chunk being knowledge/software. AGI is only economically viable if it outcompetes humans, so that right there implies an energy constraint that it can’t be 10000x less energy efficient. This constraint is naturally much more stringent for robotic applications.
Then of course the same principles apply when comparing neuromorphic vs von-neumann machines at the end of Moore’s Law. The former is fundamentally multiple OOM more energy efficient than the latter (and just or more circuit cost efficient), and thus can run multiple OOM faster at the same cost, so it obviously wins.
Early AGI is somewhat brain-like ANNS running on GPUs, later AGI is even more brain-like ANNs running on neuromorphic/PIM hardware. Hmm maybe I need to make those parts more clear?
The article shows how this is probably impossible, just like it would be impossible for you to run the full Google search engine on your 2021 laptop.
Lol what do you think a modern supercomputer is, if not thousands of GPUs? There are scaling limits to parallelization, as mentioned. Or perhaps you are confused by the 1000 instance thing, but as I tried to explain: a single AGI instance is just as expensive as ~1000, at least on current non-neurmorphic hardware. (So you always get 1000-ish instances, see the circuit section)
I get the sense that we are talking past each other. I wonder if part of what’s happening here is that you have a broader notion of what counts as neuromorphic hardware and brain-like AI than I did, and are therefore making a much weaker claim than I thought you were. I can’t tell for sure but some of the things you’ve said recently make me think this.
I know that modern supercomputers are thousands of GPUs. That isn’t in conflict with what I said. I understand that on current hardware anyone able to make 1 AGI will be able to easily make many, for the reasons you mentioned.
I’m not sure what you meant by the claims I objected to, so I’ll stop trying to argue against them. I do still stand by what I said about how energy is not currently a taut constraint, and your post sure did give the impression that you thought it was. Or maybe you were just saying you think it will eventually become one?
I provided some links to neuromorphic hardware research, and I sometimes lump it in with PIM (Processor in Memory) architecture. It’s an architecture where memory and compute are unified with some artificial synapse like thing—eg memristors. It’s necessarily brain-like, as the thing it’s really good at it is running (low precision) ANNs efficiently.
The end of Moore’s Law is actually a series of incomplete barriers, each of which only allows an increasingly narrower computational design to scale past that barrier: dennard scaling blocked serial architectures (CPUs), next up the end of energy scaling will block von-neumman arch (GPUs/TPUs), allowing only neuromorphic/PIM to scale much further, then there is the final size scaling barrier for all reversible computation, and only exotic reversible/quantum computers scale past that.
Your comment about your laptop running AGI suggested you had a different model for the min hardware requirements in terms of RAM, RAM bandwidth, and flops.
Great. Thanks.
Funnily enough, if this paragraph had appeared in the original text by way of explanation for what you meant by “The future of AGI is to become more like the brain, not less.” then I would not have objected. Sorry for the misunderstanding. I do still think we have some sort of disagreement about takeoff and timelines modelling, but maybe we don’t.
I shouldn’t have said laptop; I should have said whatever it was you said (GPUs etc.) I happen to also believe it could in principle be done on a laptop with the right knowledge (imagine God himself wrote the code) but I shouldn’t have opened that can of worms. I agree that for all practical purposes it may as well be impossible.
If I had to guess at the crux between your disagreement on timelines, I think you might disagree about the FOOM process itself, but not about energy as a taut constraint to the first human-level AGI (which you both seem to agree isn’t the case). Per Jacob’s model, if a FOOM requires the AGI to quickly become much much smarter than humans, that excess smartness will inherently come with a massive electrical cost, which will cap it out at O(10^9) human-brain-equivalents until it can substantially increase world energy output. This would serve to arrest FOOM at roughly human-civilization-scale collective intelligence, except with much better coordination abilities.
To me, this was a pretty significant update, as I was previously imagining FOOMing to not top out before it was way way past human civilization’s collective bio-compute.
What do you mean by “arrest FOOM?” I am quite confident that by the time the intelligence explosion starts winding down, it’ll be past the point of no return for humans. Maybe from the AIs perspective progress will have stagnated due to compute constraints, and further progress will happen only once they can design exotic new hardware, so subjectively it feels like aeons of stagnation. But I think that “human-civilization-scale collective intelligence except with much better coordination abilities” is vastly underselling it. It’s like saying caveman humans were roughly elephant-scale intelligences except with better coordination abilities. Or saying that SpaceX is “roughly equivalent to the average US high school 9th grade class, except with better coordination abilities.” Do you disagree with this?
I’m not entirely sure what you mean by “better coordination abilities”, but the primary difference between 9th graders and SpaceX employees is knowledge/training. The primary difference between elephants and caveman humans was the latter possessing language and thus technology/culture and beyond single-lifetime knowledge accumulation.
AGI instances of the same shared mind/model should obviously have a coordination advantage, as should those created by the same organization, but there are many organizations that may be creating AGI.
Even in a world where AGI is running on GPUs and is scale-out bound by energy use and fab output, it may be that a smaller number of larger-than-human minds trained on beyond-human experience have a strong advantage, and in general I’d expect those types of advantages to matter at least as much as ‘coordination abilities’.
I don’t have a precise definition in mind since I was parrotting Yonadav. My point was that SpaceX is way better than a random similarly-sized group of high schoolers in many many important ways, even though SpaceX consumes just as many calories/energy as the high schoolers, such that it’s totally misleading to describe them as “roughly equivalent except that SpaceX has a massive coordination advantage.” The only thing roughly equivalent about them is their energy consumption, which just goes to show energy consumption is not a useful metric here.
I totally agree that fewer, larger brains with experience advantages seem likely to outcompete many merely human-sized brains. In fact I think I agree with everything you said in this comment.
No, I agree that coordination is the ballgame, and there’s not huge practical difference there. Entirely separately, if we were worried about a treacherous turn due to a system being way above our capabilities, this lowers the probability of that, because there are clearer signals associated with the increase in intelligence needed to out-scheme a team of careful humans. (Large compute + energy usage.) It’s not close to being a solution, but it does bound the tail of arbitrarily-pessimistic outcomes from small-scale projects suddenly FOOMing. It also introduces an additional moniterable real world effect (a spike in energy usage noticeable by energy regulation systems).
Ah, I see. I think 10^9 is not a meaningful number to be talking about; long before there are 10^9 brain-equivalents worth of compute going into AI, we’ll be past the point of no return. But if instead you are talking about an amount of compute large enough that energy companies should be able to detect it, then yeah this seems fairly plausible. Supercomputers can’t be hidden from energy companies as far as I know, and plausibly AGI will appear first in supercomputers, so plausibly wherever AGI appears, it’ll be known by some government that the project was underweigh at least.
I don’t think this meaningfully lowers the probability of treacherous turn due to a system being way above our capabilities though. That’s because I didn’t put much probability mass on secret-AGI-project-in-a-basement scenarios anyway. I guess if I had, then this would have updated me.
Crypto mining would be affected significantly as well, or potentially mostly instead of, total energy use: intelligence is valuable-computation-per-watt, changing v-c-p-w changes the valuable energy spend of computers that sit idle, so you’d expect projects bidding on this to overtake cryptocurrency mining as the best use of idle computers, whether that’s due to a single project buying up computers and power, or due to a cryptocurrency energy-wasting-farm suddenly finding something directly valuable to do with their machines (and in fact it is already the case that ML can pay more than crypto mining).
@jacob_cannell’s argument is simply that the brain has more to tell us about the structure of high-value-per-watt computation than expected by ai philosophers. It does not mean the brain is at the absolute limit of generalized algorithmic energy efficiency (aka the only possible generalized intelligence metric); it only means that the structure of physical limits on algorithmic energy efficiency must be obeyed by any intelligent system, and while there may be large asymptotic speedups from larger scale structure improvement, the local efficiency of the brain is nothing to shake a stick at.
Perhaps ASI could be done earlier by “wasting” energy on lower value-per-watt AI projects—and in fact, there’s no reason to believe otherwise from available research progress. All AI progress that has ever occurred, after all, has been on lower generalized value-per-watt compute substrate than human brains can provide, but in return for being on thermodynamically inefficiency computers, it gets benefits that can economically compete with humans—eg via algorithmic specialization, high precision math, or exact repeatability—and thereby, AI research makes progress towards ever-increasing value-of-compute-output-per-watt.
If a system is AGI, it means that it is within a constant factor of energy efficiency per watt of the human brain for nearly all tasks—potentially a large constant factor, but a constant factor nonetheless. If it’s just barely general superintelligence and is wildly inefficient at small scales, then the only possible way it could be superintelligence is because it scales (maybe just barely) better than the brain with problem difficulty—extracting asymptotically better value-per-watt than an equivalently scaled system of humans consuming the same number of watts, due to what must ground out to improved total-system-thermodynamic-efficiency-per-unit-useful-computation.
Your proposal seems to be that we should expect a large scale multi-agent AI system to be superintelligence in this larger-scale asymptotic respect, despite that the human brain has shockingly high interconnect-efficiency and basic thermal compute efficiency. I have no disagreement. What this does tell us is that deep learning doesn’t have a unique expected qualitative advantage nor expected qualitative disadvantage vs the brain. if it becomes able to find more energy-efficient energy routes through its processing substrate’s spacetime (ie more energy efficient algorithms) (ie more intelligent algorithms), then it wins. predicting when that will happen, which teams are close, and guaranteeing safety becomes the remaining issue: guaranteeing that the resulting system does not cause mass energy-structure-aka-data loss (eg, death, body damage, injury, memory loss, hdd corruption/erasure, failure to cryonically freeze as-yet-unrepairable beings, etc) nor interfere significantly with the values of living beings (torture, energy-budget squeeze, cryonic freezing of beings who wish to continue operating, etc).
(due to the cycles seen in evolutionary game theory, I suspect that an unsafe or bad-at-distributed-systems-fairness AGI mega-network will moderately quickly collapse with similar high-defection-rate issues to the human society we have; and if it exterminates and then succeeds humanity, I’d guess it will eventually evolve a large scale cooperative system again; but there’s no reason to believe it wouldn’t kill us first. friendly multi-agent systems are the hardest part of this whole thing, IMO.)