Energy efficiency is not relevant to predicting the future of AGI. Who cares if EfficientZero-9000 costs ten thousand times as much energy per computation than John von Neumann did, if it’s qualitatively smarter than him in every way and also thinks a thousand times faster? (Similar things can be said about various other kinds of efficiency. Data efficiency seems like the only kind that is plausibly relevant.)
Part of the intent of this article is to taboo ‘smarter’, and break down cognitive efficiency into more detailed analyzable sub metrics.
Since you call your model “EZ-9000″, I”m going to assume it’s running on GPUs or equivalent. If EZ-9000 uses 10,000x more energy but runs 1000x faster, ie if you mean 100kHZ for 100kW, then that isn’t that different—and in fact more efficient—than my model of 1000 agent instances using 1MW total. Training is easily parallelizable so 1000x parallel is almost as good as 1000x serial speedup, either way you can get human lifetime equivalent in about a month ish.
If instead you meant 100kW for only 100hz, so 100MW for 100kHZ, then that actually doesn’t change the net time to train—still takes about a month, now it just costs much more - $5,000/hr for the electricity alone, so perhaps $10M total. Not all that different though unless the hardware cost is also proportionally more expensive.
But in practice 1000x speedup just isn’t possible in any near future time frame for a brain sized model on parallel von-neumman hardware like GPUs (but 1000x parallelization is) - and my article outlines the precise physics of why this is so. (It is of course probably possible for more advanced neuromorphic hardware, but that probably comes after AGI on GPUs).
But to analyze your thought experiment more would require fixing many more details—what’s the model size? What kind of hardware? etc. My article allows estimating energy and ultimately training costs which can then feed into forecasts.
You link to your Birds/Brains/Planes article, which I generally agree with (and indeed have an unfinished similar post from a while back!), and will just quote from your own summary:
I argue that an entire class of common arguments against short timelines is bogus, and provide weak evidence that anchoring to the human-brain-human-lifetime milestone is reasonable.
My brain efficiency argument provides further evidence for anchoring to the human-brain-human-lifetime milestone (with some unknown factor for finding the right design), evidence that the brain is efficient in agreement, and is in fact an argument for short timelines to AGI! (as it shows the brain really can’t be doing much more flops/s than current GPUs!).
In a sentence, my argument is that the complexity and mysteriousness and efficiency of the human brain (compared to artificial neural nets) is almost zero evidence that building TAI will be difficult, because evolution typically makes things complex and mysterious and efficient, even when there are simple, easily understood, inefficient designs that work almost as well (or even better!) for human purposes.
I also agree that first human architected AGI will likely be inefficient compared to the brain in various key metrics—if you got to the end of the post where I estimate what near-future AGI looks like it’s running on GPUs and 2 OOM less energy efficient, but still could provide several OOM economic advantage.
So I’m actually not clear on what we disagree on? Other than your statement that “Energy efficiency is not relevant to predicting the future of AGI”, which is almost obviously false as stated. For a simple counter-argument: any AGI design that uses 109 more energy than the human brain is probably economically infeasible to train.
“Energy efficiency is not relevant...” is false in the same way that “Cheese efficiency is not relevant...” is false. (Cheese efficiency is how much cheese an AI design consumes. You might think this is not relevant because most current and future AI designs consume negligible amounts of cheese, but hypothetically if an AGI design consumed 10^9 kg of cheese per second it would not be viable.)
This is just an aggressive way of saying that energy is cheap & the builders of AGI will be willing to buy lots of it to fuel their AGI. The brain may be super efficient given its energy constraints but AGI does not have an energy constraint, for practical purposes. Sure, 10^9 times less energy efficient would be a problem, but 10^3 wouldn’t be. And if I understand you correctly you are saying that modern GPUs are only 10^2 times less energy efficient.
In the end of the article I discuss/estimate near future brain-scale AGI requiring 1000 GPUs for 1000 brain size agents in parallel, using roughly 1MW total or 1KW per agent instance. That works out to about $2,000/yr for the power&cooling cost. Or if we just estimate directly based on vast.ai prices it’s more like $5,000/yr per agent total for hardware rental (including power costs). The rental price using enterprise GPUs is at least 4x as much, so more like $20,000/yr per agent. So the potential economic advantage is not yet multiple OOM. It’s actually more like little to no advantage for low-end robotic labor, or perhaps 1 OOM advantage for programmers/researchers/ec. But if we had AGI today GPU prices would just skyrocket to arbitrage that advantage, at least until foundries could ramp up GPU production.
This does not sound like a taut constraint to me. Sure, an agent that is dumber than a dumb human and costs $20,000/yr to run won’t be transforming the economy anytime soon. But once we get such an agent, after a few additional years (months? Days?) of R&D we’ll have agents that are smarter than a smart human and cost $20,000/yr to run, and by that point we are in FOOM territory. (And this is neglecting the fact that you chose higher numbers for your estimate and also the price of compute, and the price of energy, will be going down in the next decade.) [Unimportant aside: I don’t get the point about arbitrage. When nukes were invented, the price of uranium probably went up. So what?]
I worry that I’m straw-manning you and/or just not comprehending your argument so in the next few days I plan to reread your post more closely and think more carefully about it. The point I’m making is similar to what Vaniver and Steven Byrnes said, I think.
Computation literally is organized energy[1]. Intelligence is a particular efficient organization of computational energy towards evolutionary/economic goals.
Sure, 10^9 times less energy efficient would be a problem, but 10^3 wouldn’t be. And if I understand you correctly you are saying that modern GPUs are only 10^2 times less energy efficient.
Yeah, so again it’s not clear to me what exactly the crux here is, other than some surface level thing where we both agree 10^9 energy efficiency gap would be a blocker, and agree 10^3 or 10^2 isn’t, but then you would label that as “Energy efficiency is not relevant”.
Sure, an agent that is dumber than a dumb human and costs $20,000/yr to run won’t be transforming the economy anytime soon. But once we get such an agent, after a few additional years (months? Days?) of R&D we’ll have agents that are smarter than a smart human and cost $20,000/yr to run, and by that point we are in FOOM territory
The question of when we’ll get almost-human level agents for $20,000/yr vs smart-human-level for $1,000/yr vs today where almost-human level costs unknown large amounts, perhaps $billions—is ultimately an energy efficiency constrained question[2].
Thus the cheese analogy is non-sensical. And because computation literally is energy, computational efficiency is ultimately various forms of energy efficiency.
Although again to reiterate as I said in the article, the principle blocker today for early AGI is knowledge, because GPUs are probably only a few OOM less energy efficient at the hardware level (our current net inefficiency is more on the algorithm/software side). But even that doesn’t make low level circuit energy efficiency irrelevant: it constrains takeoff speed and also especially the form/shape of AGI.
I was wrong to link my birds brains planes post btw, you are right, it doesn’t really contradict what you are saying. As for the cheese analogy… I still think I’m right but I’ll rest my case.
Like I said at the top, I really appreciate this post and learned a lot from it—I just think it draws some erroneous conclusions. It’s possible I’m just not understanding the argument though.
For any particular energy budget there is a Landauer Limit imposed maximum net communication flow rate through the system and a direct tradeoff between clock speed and accessible memory size at that flow rate.
Yes, and you go on to argue that the brain is operating about as fast as it could possibly operate given its tiny energy budget. But current and future computers will have much, much larger energy budgets. They can therefore operate much faster (and they do).
Correct me if I’m wrong, but my impression is that currently paying for energy is less than 10% of the cost of compute. Most of the cost is the hardware itself, and maintaining the facilities. In light of that, it really does seem that we are not energy-constrained. Maybe in the future we will be, but for now, the cost of the energy is small compared to the cost of everything else that goes into training and running AI. So chip designers and AI designers are free to use high energy budgets if it gets other benefits like faster speed or cheaper manufacturing or whatever. If they are using high energy budgets, they don’t need to build chips to be more and more like the human brain, which has a low energy budget. In other words, they don’t need this:
Achieving those levels of energy efficiency will probably require brain-like neuromorphic-ish hardware, circuits, and learned software via training/education. The future of AGI is to become more like the brain, not less.
Nor need chip designers optimize towards maximal energy efficiency; energy efficiency is not top of the priority list for things to optimize for, since energy is only a small fraction of the cost:
Likewise, DL evolving towards AGI converges on brain reverse engineering[60][61], especially when optimizing towards maximal energy efficiency for complex real world tasks.
Meanwhile, this is off-base too:
Why should we care? Brain efficiency matters a great deal for AGI timelines and takeoff speeds, as AGI is implicitly/explicitly defined in terms of brain parity. If the brain is about 6 OOM away from the practical physical limits of energy efficiency, then roughly speaking we should expect about 6 OOM of further Moore’s Law hardware improvement past the point of brain parity:
AGI is not defined as hardware that performs computations as energy-efficiently as the brain. Instead, it is software that performs all important intellectual tasks as effectively as the brain, cost be damned. The goal of the field of AI is not to equal the brain in energy-efficiency, any more than the goal of powered flight is to produce machines as energy-efficient as birds.
One possibility is that I’m misinterpreting your conclusion about how the future of AGI is to become more like the brain, not less. I interpreted that to mean that you were forecasting a rise in neuromorphic computing and/or forecasting that the biggest progress in AGI will come from people studying neuroscience to learn from the brain, and (given what you said about brain parity in the introduction) that you don’t think we’ll get AGI until we do those things and make it more like the brain. Do you think those things, or anything adjacent? If not, then maybe we don’t disagree after all. (Though then I wonder what you meant by “The future of AGI is to become more like the brain, not less.” And also it still seems like we have some sort of disagreement about the importance of energy efficiency more generally.)
The question of when we’ll get almost-human level agents for $20,000/yr vs smart-human-level for $1,000/yr vs today where almost-human level costs unknown large amounts, perhaps $billions—is ultimately an energy efficiency constrained question. … Although again to reiterate as I said in the article, the principle blocker today for early AGI is knowledge, because GPUs are probably only a few OOM less energy efficient at the hardware level (our current net inefficiency is more on the algorithm/software side). But even that doesn’t make low level circuit energy efficiency irrelevant: it constrains takeoff speed and also especially the form/shape of AGI
Why? Am I wrong that energy is <10% the cost of compute? How is energy efficiency a taut constraint then? Or are you merely saying that it is a constraint, not a taut one? Just as cheese efficiency is a constraint, but not a taut one?
I of course agree that if we had the right knowledge, we could build AGI today and it would probably even run on my laptop. I think it doesn’t follow that the principle blocker today for early AGI is knowledge. There are lots of things X such that if we had X we could build AGI today. I think it’s only appropriate to label “the principle blocker” the one that is realistically most likely to be achieved first. And realistically I think we are more likely to get AGI by scaling up models and running them on massive supercomputers (for much more energy cost than the human brain uses!) than by achieving great new insights of knowledge such that we can run 1000 AGI on 1000 2021 GPUs. (However, on this point we can agree to disagree, it’s mostly a matter of intuition anyway.)
But current and future computers will have much, much larger energy budgets. They can therefore operate much faster (and they do).
Faster clock speed but not faster thought speed., as they just burn all that speed inefficiently simulating a large circuit. Even though a single GPU has similar nominal flops compared to the brain and uses 30x more power, they have about 3 OOM less memory and memory bandwidth. GPUs are amazing at simulating insect brains at high speeds.
But we want big brain-scale ANNs, as that is what intelligence requires. So you need 1000x GPUs in parallel with complex expensive high bandwidth interconnect to get a single brain-size ANN, at which point you also get 1000 instances (of the same mind). That only allows you to run it at brain speed, not any faster. You can’t then just run it on 1 million GPUs to get 1000x speedup—that’s not how it works at all. Instead you’d get 1 million instances of 1000 brain size ANNs. This ultimately relate to energy flow efficiency—see the section on circuits. Energy efficiency is a complex multi-dimensional engineering constraint set, it’s not a simple linear economic multiplier.
Moore’s Law isn’t going to improve this scenario much—at least not for GPUs or any von neumman style architecture.
Moore’s Law will eventually allow a very specific narrow class of designs to simultaneously achieve brain scale and high speedup, but that narrow class of designs is necessarily neuromorphic and similar to an artificial brain. Furthermore, economic pressure will naturally push the industry towards neuromorphic brain style AGI designs, as they will massively outcompete everything else.
These are the engineering constraints from physics the article is attempting to elucidate.
If they are using high energy budgets, they don’t need to build chips to be more and more like the human brain, which has a low energy budget. In other words, they don’t need this:
Given the choice between a neuromorphic design which can run 1,000 instances of 1,000 unique agent minds at 100x the speed of human thought, or a von-neumman type design which can run 1,000 instances of only 1 agent mind at 1x the speed of human thought at the same prices—the latter is not competitive.
AGI is not defined as hardware that performs computations as energy-efficiently as the brain. Instead, it is software that performs all important intellectual tasks as effectively as the brain, cost be damned.
The cost/value of a human worker is like 0.1% energy equivalent, and mostly intangibles with a significant chunk being knowledge/software. AGI is only economically viable if it outcompetes humans, so that right there implies an energy constraint that it can’t be 10000x less energy efficient. This constraint is naturally much more stringent for robotic applications.
Then of course the same principles apply when comparing neuromorphic vs von-neumann machines at the end of Moore’s Law. The former is fundamentally multiple OOM more energy efficient than the latter (and just or more circuit cost efficient), and thus can run multiple OOM faster at the same cost, so it obviously wins.
One possibility is that I’m misinterpreting your conclusion about how the future of AGI is to become more like the brain, not less. I interpreted that to mean that you were forecasting a rise in neuromorphic computing and/or forecasting that the biggest progress in AGI will come from people studying neuroscience to learn from the brain, and (given what you said about brain parity in the introduction) that you don’t think we’ll get AGI until we do those things and make it more like the brain. Do you think those things, or anything adjacent?
Early AGI is somewhat brain-like ANNS running on GPUs, later AGI is even more brain-like ANNs running on neuromorphic/PIM hardware. Hmm maybe I need to make those parts more clear?
I of course agree that if we had the right knowledge, we could build AGI today and it would probably even run on my laptop.
The article shows how this is probably impossible, just like it would be impossible for you to run the full Google search engine on your 2021 laptop.
And realistically I think we are more likely to get AGI by scaling up models and running them on massive supercomputers (for much more energy cost than the human brain uses!) than by achieving great new insights of knowledge such that we can run 1000 AGI on 1000 2021 GPUs.
Lol what do you think a modern supercomputer is, if not thousands of GPUs? There are scaling limits to parallelization, as mentioned. Or perhaps you are confused by the 1000 instance thing, but as I tried to explain: a single AGI instance is just as expensive as ~1000, at least on current non-neurmorphic hardware. (So you always get 1000-ish instances, see the circuit section)
I get the sense that we are talking past each other. I wonder if part of what’s happening here is that you have a broader notion of what counts as neuromorphic hardware and brain-like AI than I did, and are therefore making a much weaker claim than I thought you were. I can’t tell for sure but some of the things you’ve said recently make me think this.
I know that modern supercomputers are thousands of GPUs. That isn’t in conflict with what I said. I understand that on current hardware anyone able to make 1 AGI will be able to easily make many, for the reasons you mentioned.
I’m not sure what you meant by the claims I objected to, so I’ll stop trying to argue against them. I do still stand by what I said about how energy is not currently a taut constraint, and your post sure did give the impression that you thought it was. Or maybe you were just saying you think it will eventually become one?
I provided some links to neuromorphic hardware research, and I sometimes lump it in with PIM (Processor in Memory) architecture. It’s an architecture where memory and compute are unified with some artificial synapse like thing—eg memristors. It’s necessarily brain-like, as the thing it’s really good at it is running (low precision) ANNs efficiently.
The end of Moore’s Law is actually a series of incomplete barriers, each of which only allows an increasingly narrower computational design to scale past that barrier: dennard scaling blocked serial architectures (CPUs), next up the end of energy scaling will block von-neumman arch (GPUs/TPUs), allowing only neuromorphic/PIM to scale much further, then there is the final size scaling barrier for all reversible computation, and only exotic reversible/quantum computers scale past that.
Your comment about your laptop running AGI suggested you had a different model for the min hardware requirements in terms of RAM, RAM bandwidth, and flops.
The end of Moore’s Law is actually a series of incomplete barriers, each of which only allows an increasingly narrower computational design to scale past that barrier: dennard scaling blocked serial architectures (CPUs), next up the end of energy scaling will block von-neumman arch (GPUs/TPUs), allowing only neuromorphic/PIM to scale much further, then there is the final size scaling barrier for all reversible computation, and only exotic reversible/quantum computers scale past that.
Funnily enough, if this paragraph had appeared in the original text by way of explanation for what you meant by “The future of AGI is to become more like the brain, not less.” then I would not have objected. Sorry for the misunderstanding. I do still think we have some sort of disagreement about takeoff and timelines modelling, but maybe we don’t.
I shouldn’t have said laptop; I should have said whatever it was you said (GPUs etc.) I happen to also believe it could in principle be done on a laptop with the right knowledge (imagine God himself wrote the code) but I shouldn’t have opened that can of worms. I agree that for all practical purposes it may as well be impossible.
If I had to guess at the crux between your disagreement on timelines, I think you might disagree about the FOOM process itself, but not about energy as a taut constraint to the first human-level AGI (which you both seem to agree isn’t the case). Per Jacob’s model, if a FOOM requires the AGI to quickly become much much smarter than humans, that excess smartness will inherently come with a massive electrical cost, which will cap it out at O(10^9) human-brain-equivalents until it can substantially increase world energy output. This would serve to arrest FOOM at roughly human-civilization-scale collective intelligence, except with much better coordination abilities.
To me, this was a pretty significant update, as I was previously imagining FOOMing to not top out before it was way way past human civilization’s collective bio-compute.
What do you mean by “arrest FOOM?” I am quite confident that by the time the intelligence explosion starts winding down, it’ll be past the point of no return for humans. Maybe from the AIs perspective progress will have stagnated due to compute constraints, and further progress will happen only once they can design exotic new hardware, so subjectively it feels like aeons of stagnation. But I think that “human-civilization-scale collective intelligence except with much better coordination abilities” is vastly underselling it. It’s like saying caveman humans were roughly elephant-scale intelligences except with better coordination abilities. Or saying that SpaceX is “roughly equivalent to the average US high school 9th grade class, except with better coordination abilities.” Do you disagree with this?
I’m not entirely sure what you mean by “better coordination abilities”, but the primary difference between 9th graders and SpaceX employees is knowledge/training. The primary difference between elephants and caveman humans was the latter possessing language and thus technology/culture and beyond single-lifetime knowledge accumulation.
AGI instances of the same shared mind/model should obviously have a coordination advantage, as should those created by the same organization, but there are many organizations that may be creating AGI.
Even in a world where AGI is running on GPUs and is scale-out bound by energy use and fab output, it may be that a smaller number of larger-than-human minds trained on beyond-human experience have a strong advantage, and in general I’d expect those types of advantages to matter at least as much as ‘coordination abilities’.
I don’t have a precise definition in mind since I was parrotting Yonadav. My point was that SpaceX is way better than a random similarly-sized group of high schoolers in many many important ways, even though SpaceX consumes just as many calories/energy as the high schoolers, such that it’s totally misleading to describe them as “roughly equivalent except that SpaceX has a massive coordination advantage.” The only thing roughly equivalent about them is their energy consumption, which just goes to show energy consumption is not a useful metric here.
I totally agree that fewer, larger brains with experience advantages seem likely to outcompete many merely human-sized brains. In fact I think I agree with everything you said in this comment.
No, I agree that coordination is the ballgame, and there’s not huge practical difference there.
Entirely separately, if we were worried about a treacherous turn due to a system being way above our capabilities, this lowers the probability of that, because there are clearer signals associated with the increase in intelligence needed to out-scheme a team of careful humans. (Large compute + energy usage.) It’s not close to being a solution, but it does bound the tail of arbitrarily-pessimistic outcomes from small-scale projects suddenly FOOMing. It also introduces an additional moniterable real world effect (a spike in energy usage noticeable by energy regulation systems).
Ah, I see. I think 10^9 is not a meaningful number to be talking about; long before there are 10^9 brain-equivalents worth of compute going into AI, we’ll be past the point of no return. But if instead you are talking about an amount of compute large enough that energy companies should be able to detect it, then yeah this seems fairly plausible. Supercomputers can’t be hidden from energy companies as far as I know, and plausibly AGI will appear first in supercomputers, so plausibly wherever AGI appears, it’ll be known by some government that the project was underweigh at least.
I don’t think this meaningfully lowers the probability of treacherous turn due to a system being way above our capabilities though. That’s because I didn’t put much probability mass on secret-AGI-project-in-a-basement scenarios anyway. I guess if I had, then this would have updated me.
Crypto mining would be affected significantly as well, or potentially mostly instead of, total energy use: intelligence is valuable-computation-per-watt, changing v-c-p-w changes the valuable energy spend of computers that sit idle, so you’d expect projects bidding on this to overtake cryptocurrency mining as the best use of idle computers, whether that’s due to a single project buying up computers and power, or due to a cryptocurrency energy-wasting-farm suddenly finding something directly valuable to do with their machines (and in fact it is already the case that ML can pay more than crypto mining).
@jacob_cannell’s argument is simply that the brain has more to tell us about the structure of high-value-per-watt computation than expected by ai philosophers. It does not mean the brain is at the absolute limit of generalized algorithmic energy efficiency (aka the only possible generalized intelligence metric); it only means that the structure of physical limits on algorithmic energy efficiency must be obeyed by any intelligent system, and while there may be large asymptotic speedups from larger scale structure improvement, the local efficiency of the brain is nothing to shake a stick at.
Perhaps ASI could be done earlier by “wasting” energy on lower value-per-watt AI projects—and in fact, there’s no reason to believe otherwise from available research progress. All AI progress that has ever occurred, after all, has been on lower generalized value-per-watt compute substrate than human brains can provide, but in return for being on thermodynamically inefficiency computers, it gets benefits that can economically compete with humans—eg via algorithmic specialization, high precision math, or exact repeatability—and thereby, AI research makes progress towards ever-increasing value-of-compute-output-per-watt.
If a system is AGI, it means that it is within a constant factor of energy efficiency per watt of the human brain for nearly all tasks—potentially a large constant factor, but a constant factor nonetheless. If it’s just barely general superintelligence and is wildly inefficient at small scales, then the only possible way it could be superintelligence is because it scales (maybe just barely) better than the brain with problem difficulty—extracting asymptotically better value-per-watt than an equivalently scaled system of humans consuming the same number of watts, due to what must ground out to improved total-system-thermodynamic-efficiency-per-unit-useful-computation.
Your proposal seems to be that we should expect a large scale multi-agent AI system to be superintelligence in this larger-scale asymptotic respect, despite that the human brain has shockingly high interconnect-efficiency and basic thermal compute efficiency. I have no disagreement. What this does tell us is that deep learning doesn’t have a unique expected qualitative advantage nor expected qualitative disadvantage vs the brain. if it becomes able to find more energy-efficient energy routes through its processing substrate’s spacetime (ie more energy efficient algorithms) (ie more intelligent algorithms), then it wins. predicting when that will happen, which teams are close, and guaranteeing safety becomes the remaining issue: guaranteeing that the resulting system does not cause mass energy-structure-aka-data loss (eg, death, body damage, injury, memory loss, hdd corruption/erasure, failure to cryonically freeze as-yet-unrepairable beings, etc) nor interfere significantly with the values of living beings (torture, energy-budget squeeze, cryonic freezing of beings who wish to continue operating, etc).
(due to the cycles seen in evolutionary game theory, I suspect that an unsafe or bad-at-distributed-systems-fairness AGI mega-network will moderately quickly collapse with similar high-defection-rate issues to the human society we have; and if it exterminates and then succeeds humanity, I’d guess it will eventually evolve a large scale cooperative system again; but there’s no reason to believe it wouldn’t kill us first. friendly multi-agent systems are the hardest part of this whole thing, IMO.)
Part of the intent of this article is to taboo ‘smarter’, and break down cognitive efficiency into more detailed analyzable sub metrics.
Since you call your model “EZ-9000″, I”m going to assume it’s running on GPUs or equivalent. If EZ-9000 uses 10,000x more energy but runs 1000x faster, ie if you mean 100kHZ for 100kW, then that isn’t that different—and in fact more efficient—than my model of 1000 agent instances using 1MW total. Training is easily parallelizable so 1000x parallel is almost as good as 1000x serial speedup, either way you can get human lifetime equivalent in about a month ish.
If instead you meant 100kW for only 100hz, so 100MW for 100kHZ, then that actually doesn’t change the net time to train—still takes about a month, now it just costs much more - $5,000/hr for the electricity alone, so perhaps $10M total. Not all that different though unless the hardware cost is also proportionally more expensive.
But in practice 1000x speedup just isn’t possible in any near future time frame for a brain sized model on parallel von-neumman hardware like GPUs (but 1000x parallelization is) - and my article outlines the precise physics of why this is so. (It is of course probably possible for more advanced neuromorphic hardware, but that probably comes after AGI on GPUs).
But to analyze your thought experiment more would require fixing many more details—what’s the model size? What kind of hardware? etc. My article allows estimating energy and ultimately training costs which can then feed into forecasts.
You link to your Birds/Brains/Planes article, which I generally agree with (and indeed have an unfinished similar post from a while back!), and will just quote from your own summary:
My brain efficiency argument provides further evidence for anchoring to the human-brain-human-lifetime milestone (with some unknown factor for finding the right design), evidence that the brain is efficient in agreement, and is in fact an argument for short timelines to AGI! (as it shows the brain really can’t be doing much more flops/s than current GPUs!).
I also agree that first human architected AGI will likely be inefficient compared to the brain in various key metrics—if you got to the end of the post where I estimate what near-future AGI looks like it’s running on GPUs and 2 OOM less energy efficient, but still could provide several OOM economic advantage.
So I’m actually not clear on what we disagree on? Other than your statement that “Energy efficiency is not relevant to predicting the future of AGI”, which is almost obviously false as stated. For a simple counter-argument: any AGI design that uses 109 more energy than the human brain is probably economically infeasible to train.
“Energy efficiency is not relevant...” is false in the same way that “Cheese efficiency is not relevant...” is false. (Cheese efficiency is how much cheese an AI design consumes. You might think this is not relevant because most current and future AI designs consume negligible amounts of cheese, but hypothetically if an AGI design consumed 10^9 kg of cheese per second it would not be viable.)
This is just an aggressive way of saying that energy is cheap & the builders of AGI will be willing to buy lots of it to fuel their AGI. The brain may be super efficient given its energy constraints but AGI does not have an energy constraint, for practical purposes. Sure, 10^9 times less energy efficient would be a problem, but 10^3 wouldn’t be. And if I understand you correctly you are saying that modern GPUs are only 10^2 times less energy efficient.
This does not sound like a taut constraint to me. Sure, an agent that is dumber than a dumb human and costs $20,000/yr to run won’t be transforming the economy anytime soon. But once we get such an agent, after a few additional years (months? Days?) of R&D we’ll have agents that are smarter than a smart human and cost $20,000/yr to run, and by that point we are in FOOM territory. (And this is neglecting the fact that you chose higher numbers for your estimate and also the price of compute, and the price of energy, will be going down in the next decade.) [Unimportant aside: I don’t get the point about arbitrage. When nukes were invented, the price of uranium probably went up. So what?]
I worry that I’m straw-manning you and/or just not comprehending your argument so in the next few days I plan to reread your post more closely and think more carefully about it. The point I’m making is similar to what Vaniver and Steven Byrnes said, I think.
Computation literally is organized energy[1]. Intelligence is a particular efficient organization of computational energy towards evolutionary/economic goals.
Yeah, so again it’s not clear to me what exactly the crux here is, other than some surface level thing where we both agree 10^9 energy efficiency gap would be a blocker, and agree 10^3 or 10^2 isn’t, but then you would label that as “Energy efficiency is not relevant”.
The question of when we’ll get almost-human level agents for $20,000/yr vs smart-human-level for $1,000/yr vs today where almost-human level costs unknown large amounts, perhaps $billions—is ultimately an energy efficiency constrained question[2].
Thus the cheese analogy is non-sensical. And because computation literally is energy, computational efficiency is ultimately various forms of energy efficiency.
Although again to reiterate as I said in the article, the principle blocker today for early AGI is knowledge, because GPUs are probably only a few OOM less energy efficient at the hardware level (our current net inefficiency is more on the algorithm/software side). But even that doesn’t make low level circuit energy efficiency irrelevant: it constrains takeoff speed and also especially the form/shape of AGI.
I was wrong to link my birds brains planes post btw, you are right, it doesn’t really contradict what you are saying. As for the cheese analogy… I still think I’m right but I’ll rest my case.
Like I said at the top, I really appreciate this post and learned a lot from it—I just think it draws some erroneous conclusions. It’s possible I’m just not understanding the argument though.
Yes, and you go on to argue that the brain is operating about as fast as it could possibly operate given its tiny energy budget. But current and future computers will have much, much larger energy budgets. They can therefore operate much faster (and they do).
Correct me if I’m wrong, but my impression is that currently paying for energy is less than 10% of the cost of compute. Most of the cost is the hardware itself, and maintaining the facilities. In light of that, it really does seem that we are not energy-constrained. Maybe in the future we will be, but for now, the cost of the energy is small compared to the cost of everything else that goes into training and running AI. So chip designers and AI designers are free to use high energy budgets if it gets other benefits like faster speed or cheaper manufacturing or whatever. If they are using high energy budgets, they don’t need to build chips to be more and more like the human brain, which has a low energy budget. In other words, they don’t need this:
Nor need chip designers optimize towards maximal energy efficiency; energy efficiency is not top of the priority list for things to optimize for, since energy is only a small fraction of the cost:
Meanwhile, this is off-base too:
AGI is not defined as hardware that performs computations as energy-efficiently as the brain. Instead, it is software that performs all important intellectual tasks as effectively as the brain, cost be damned. The goal of the field of AI is not to equal the brain in energy-efficiency, any more than the goal of powered flight is to produce machines as energy-efficient as birds.
One possibility is that I’m misinterpreting your conclusion about how the future of AGI is to become more like the brain, not less. I interpreted that to mean that you were forecasting a rise in neuromorphic computing and/or forecasting that the biggest progress in AGI will come from people studying neuroscience to learn from the brain, and (given what you said about brain parity in the introduction) that you don’t think we’ll get AGI until we do those things and make it more like the brain. Do you think those things, or anything adjacent? If not, then maybe we don’t disagree after all. (Though then I wonder what you meant by “The future of AGI is to become more like the brain, not less.” And also it still seems like we have some sort of disagreement about the importance of energy efficiency more generally.)
Why? Am I wrong that energy is <10% the cost of compute? How is energy efficiency a taut constraint then? Or are you merely saying that it is a constraint, not a taut one? Just as cheese efficiency is a constraint, but not a taut one?
I of course agree that if we had the right knowledge, we could build AGI today and it would probably even run on my laptop. I think it doesn’t follow that the principle blocker today for early AGI is knowledge. There are lots of things X such that if we had X we could build AGI today. I think it’s only appropriate to label “the principle blocker” the one that is realistically most likely to be achieved first. And realistically I think we are more likely to get AGI by scaling up models and running them on massive supercomputers (for much more energy cost than the human brain uses!) than by achieving great new insights of knowledge such that we can run 1000 AGI on 1000 2021 GPUs. (However, on this point we can agree to disagree, it’s mostly a matter of intuition anyway.)
Faster clock speed but not faster thought speed., as they just burn all that speed inefficiently simulating a large circuit. Even though a single GPU has similar nominal flops compared to the brain and uses 30x more power, they have about 3 OOM less memory and memory bandwidth. GPUs are amazing at simulating insect brains at high speeds.
But we want big brain-scale ANNs, as that is what intelligence requires. So you need 1000x GPUs in parallel with complex expensive high bandwidth interconnect to get a single brain-size ANN, at which point you also get 1000 instances (of the same mind). That only allows you to run it at brain speed, not any faster. You can’t then just run it on 1 million GPUs to get 1000x speedup—that’s not how it works at all. Instead you’d get 1 million instances of 1000 brain size ANNs. This ultimately relate to energy flow efficiency—see the section on circuits. Energy efficiency is a complex multi-dimensional engineering constraint set, it’s not a simple linear economic multiplier.
Moore’s Law isn’t going to improve this scenario much—at least not for GPUs or any von neumman style architecture.
Moore’s Law will eventually allow a very specific narrow class of designs to simultaneously achieve brain scale and high speedup, but that narrow class of designs is necessarily neuromorphic and similar to an artificial brain. Furthermore, economic pressure will naturally push the industry towards neuromorphic brain style AGI designs, as they will massively outcompete everything else.
These are the engineering constraints from physics the article is attempting to elucidate.
Given the choice between a neuromorphic design which can run 1,000 instances of 1,000 unique agent minds at 100x the speed of human thought, or a von-neumman type design which can run 1,000 instances of only 1 agent mind at 1x the speed of human thought at the same prices—the latter is not competitive.
The cost/value of a human worker is like 0.1% energy equivalent, and mostly intangibles with a significant chunk being knowledge/software. AGI is only economically viable if it outcompetes humans, so that right there implies an energy constraint that it can’t be 10000x less energy efficient. This constraint is naturally much more stringent for robotic applications.
Then of course the same principles apply when comparing neuromorphic vs von-neumann machines at the end of Moore’s Law. The former is fundamentally multiple OOM more energy efficient than the latter (and just or more circuit cost efficient), and thus can run multiple OOM faster at the same cost, so it obviously wins.
Early AGI is somewhat brain-like ANNS running on GPUs, later AGI is even more brain-like ANNs running on neuromorphic/PIM hardware. Hmm maybe I need to make those parts more clear?
The article shows how this is probably impossible, just like it would be impossible for you to run the full Google search engine on your 2021 laptop.
Lol what do you think a modern supercomputer is, if not thousands of GPUs? There are scaling limits to parallelization, as mentioned. Or perhaps you are confused by the 1000 instance thing, but as I tried to explain: a single AGI instance is just as expensive as ~1000, at least on current non-neurmorphic hardware. (So you always get 1000-ish instances, see the circuit section)
I get the sense that we are talking past each other. I wonder if part of what’s happening here is that you have a broader notion of what counts as neuromorphic hardware and brain-like AI than I did, and are therefore making a much weaker claim than I thought you were. I can’t tell for sure but some of the things you’ve said recently make me think this.
I know that modern supercomputers are thousands of GPUs. That isn’t in conflict with what I said. I understand that on current hardware anyone able to make 1 AGI will be able to easily make many, for the reasons you mentioned.
I’m not sure what you meant by the claims I objected to, so I’ll stop trying to argue against them. I do still stand by what I said about how energy is not currently a taut constraint, and your post sure did give the impression that you thought it was. Or maybe you were just saying you think it will eventually become one?
I provided some links to neuromorphic hardware research, and I sometimes lump it in with PIM (Processor in Memory) architecture. It’s an architecture where memory and compute are unified with some artificial synapse like thing—eg memristors. It’s necessarily brain-like, as the thing it’s really good at it is running (low precision) ANNs efficiently.
The end of Moore’s Law is actually a series of incomplete barriers, each of which only allows an increasingly narrower computational design to scale past that barrier: dennard scaling blocked serial architectures (CPUs), next up the end of energy scaling will block von-neumman arch (GPUs/TPUs), allowing only neuromorphic/PIM to scale much further, then there is the final size scaling barrier for all reversible computation, and only exotic reversible/quantum computers scale past that.
Your comment about your laptop running AGI suggested you had a different model for the min hardware requirements in terms of RAM, RAM bandwidth, and flops.
Great. Thanks.
Funnily enough, if this paragraph had appeared in the original text by way of explanation for what you meant by “The future of AGI is to become more like the brain, not less.” then I would not have objected. Sorry for the misunderstanding. I do still think we have some sort of disagreement about takeoff and timelines modelling, but maybe we don’t.
I shouldn’t have said laptop; I should have said whatever it was you said (GPUs etc.) I happen to also believe it could in principle be done on a laptop with the right knowledge (imagine God himself wrote the code) but I shouldn’t have opened that can of worms. I agree that for all practical purposes it may as well be impossible.
If I had to guess at the crux between your disagreement on timelines, I think you might disagree about the FOOM process itself, but not about energy as a taut constraint to the first human-level AGI (which you both seem to agree isn’t the case). Per Jacob’s model, if a FOOM requires the AGI to quickly become much much smarter than humans, that excess smartness will inherently come with a massive electrical cost, which will cap it out at O(10^9) human-brain-equivalents until it can substantially increase world energy output. This would serve to arrest FOOM at roughly human-civilization-scale collective intelligence, except with much better coordination abilities.
To me, this was a pretty significant update, as I was previously imagining FOOMing to not top out before it was way way past human civilization’s collective bio-compute.
What do you mean by “arrest FOOM?” I am quite confident that by the time the intelligence explosion starts winding down, it’ll be past the point of no return for humans. Maybe from the AIs perspective progress will have stagnated due to compute constraints, and further progress will happen only once they can design exotic new hardware, so subjectively it feels like aeons of stagnation. But I think that “human-civilization-scale collective intelligence except with much better coordination abilities” is vastly underselling it. It’s like saying caveman humans were roughly elephant-scale intelligences except with better coordination abilities. Or saying that SpaceX is “roughly equivalent to the average US high school 9th grade class, except with better coordination abilities.” Do you disagree with this?
I’m not entirely sure what you mean by “better coordination abilities”, but the primary difference between 9th graders and SpaceX employees is knowledge/training. The primary difference between elephants and caveman humans was the latter possessing language and thus technology/culture and beyond single-lifetime knowledge accumulation.
AGI instances of the same shared mind/model should obviously have a coordination advantage, as should those created by the same organization, but there are many organizations that may be creating AGI.
Even in a world where AGI is running on GPUs and is scale-out bound by energy use and fab output, it may be that a smaller number of larger-than-human minds trained on beyond-human experience have a strong advantage, and in general I’d expect those types of advantages to matter at least as much as ‘coordination abilities’.
I don’t have a precise definition in mind since I was parrotting Yonadav. My point was that SpaceX is way better than a random similarly-sized group of high schoolers in many many important ways, even though SpaceX consumes just as many calories/energy as the high schoolers, such that it’s totally misleading to describe them as “roughly equivalent except that SpaceX has a massive coordination advantage.” The only thing roughly equivalent about them is their energy consumption, which just goes to show energy consumption is not a useful metric here.
I totally agree that fewer, larger brains with experience advantages seem likely to outcompete many merely human-sized brains. In fact I think I agree with everything you said in this comment.
No, I agree that coordination is the ballgame, and there’s not huge practical difference there. Entirely separately, if we were worried about a treacherous turn due to a system being way above our capabilities, this lowers the probability of that, because there are clearer signals associated with the increase in intelligence needed to out-scheme a team of careful humans. (Large compute + energy usage.) It’s not close to being a solution, but it does bound the tail of arbitrarily-pessimistic outcomes from small-scale projects suddenly FOOMing. It also introduces an additional moniterable real world effect (a spike in energy usage noticeable by energy regulation systems).
Ah, I see. I think 10^9 is not a meaningful number to be talking about; long before there are 10^9 brain-equivalents worth of compute going into AI, we’ll be past the point of no return. But if instead you are talking about an amount of compute large enough that energy companies should be able to detect it, then yeah this seems fairly plausible. Supercomputers can’t be hidden from energy companies as far as I know, and plausibly AGI will appear first in supercomputers, so plausibly wherever AGI appears, it’ll be known by some government that the project was underweigh at least.
I don’t think this meaningfully lowers the probability of treacherous turn due to a system being way above our capabilities though. That’s because I didn’t put much probability mass on secret-AGI-project-in-a-basement scenarios anyway. I guess if I had, then this would have updated me.
Crypto mining would be affected significantly as well, or potentially mostly instead of, total energy use: intelligence is valuable-computation-per-watt, changing v-c-p-w changes the valuable energy spend of computers that sit idle, so you’d expect projects bidding on this to overtake cryptocurrency mining as the best use of idle computers, whether that’s due to a single project buying up computers and power, or due to a cryptocurrency energy-wasting-farm suddenly finding something directly valuable to do with their machines (and in fact it is already the case that ML can pay more than crypto mining).
@jacob_cannell’s argument is simply that the brain has more to tell us about the structure of high-value-per-watt computation than expected by ai philosophers. It does not mean the brain is at the absolute limit of generalized algorithmic energy efficiency (aka the only possible generalized intelligence metric); it only means that the structure of physical limits on algorithmic energy efficiency must be obeyed by any intelligent system, and while there may be large asymptotic speedups from larger scale structure improvement, the local efficiency of the brain is nothing to shake a stick at.
Perhaps ASI could be done earlier by “wasting” energy on lower value-per-watt AI projects—and in fact, there’s no reason to believe otherwise from available research progress. All AI progress that has ever occurred, after all, has been on lower generalized value-per-watt compute substrate than human brains can provide, but in return for being on thermodynamically inefficiency computers, it gets benefits that can economically compete with humans—eg via algorithmic specialization, high precision math, or exact repeatability—and thereby, AI research makes progress towards ever-increasing value-of-compute-output-per-watt.
If a system is AGI, it means that it is within a constant factor of energy efficiency per watt of the human brain for nearly all tasks—potentially a large constant factor, but a constant factor nonetheless. If it’s just barely general superintelligence and is wildly inefficient at small scales, then the only possible way it could be superintelligence is because it scales (maybe just barely) better than the brain with problem difficulty—extracting asymptotically better value-per-watt than an equivalently scaled system of humans consuming the same number of watts, due to what must ground out to improved total-system-thermodynamic-efficiency-per-unit-useful-computation.
Your proposal seems to be that we should expect a large scale multi-agent AI system to be superintelligence in this larger-scale asymptotic respect, despite that the human brain has shockingly high interconnect-efficiency and basic thermal compute efficiency. I have no disagreement. What this does tell us is that deep learning doesn’t have a unique expected qualitative advantage nor expected qualitative disadvantage vs the brain. if it becomes able to find more energy-efficient energy routes through its processing substrate’s spacetime (ie more energy efficient algorithms) (ie more intelligent algorithms), then it wins. predicting when that will happen, which teams are close, and guaranteeing safety becomes the remaining issue: guaranteeing that the resulting system does not cause mass energy-structure-aka-data loss (eg, death, body damage, injury, memory loss, hdd corruption/erasure, failure to cryonically freeze as-yet-unrepairable beings, etc) nor interfere significantly with the values of living beings (torture, energy-budget squeeze, cryonic freezing of beings who wish to continue operating, etc).
(due to the cycles seen in evolutionary game theory, I suspect that an unsafe or bad-at-distributed-systems-fairness AGI mega-network will moderately quickly collapse with similar high-defection-rate issues to the human society we have; and if it exterminates and then succeeds humanity, I’d guess it will eventually evolve a large scale cooperative system again; but there’s no reason to believe it wouldn’t kill us first. friendly multi-agent systems are the hardest part of this whole thing, IMO.)