jacob_cannell comments on Brain Efficiency: Much More than You Wanted to Know

jacob_cannell 8 Jan 2022 5:20 UTC
13 points
Computation literally is organized energy^[1]. Intelligence is a particular efficient organization of computational energy towards evolutionary/economic goals.

Sure, 10^9 times less energy efficient would be a problem, but 10^3 wouldn’t be. And if I understand you correctly you are saying that modern GPUs are only 10^2 times less energy efficient.

Yeah, so again it’s not clear to me what exactly the crux here is, other than some surface level thing where we both agree 10^9 energy efficiency gap would be a blocker, and agree 10^3 or 10^2 isn’t, but then you would label that as “Energy efficiency is not relevant”.

Sure, an agent that is dumber than a dumb human and costs $20,000/yr to run won’t be transforming the economy anytime soon. But once we get such an agent, after a few additional years (months? Days?) of R&D we’ll have agents that are smarter than a smart human and cost $20,000/yr to run, and by that point we are in FOOM territory

The question of when we’ll get almost-human level agents for $20,000/yr vs smart-human-level for $1,000/yr vs today where almost-human level costs unknown large amounts, perhaps $billions—is ultimately an energy efficiency constrained question^[2].
1. ↩︎
  Thus the cheese analogy is non-sensical. And because computation literally is energy, computational efficiency is ultimately various forms of energy efficiency.
2. ↩︎
  Although again to reiterate as I said in the article, the principle blocker today for early AGI is knowledge, because GPUs are probably only a few OOM less energy efficient at the hardware level (our current net inefficiency is more on the algorithm/software side). But even that doesn’t make low level circuit energy efficiency irrelevant: it constrains takeoff speed and also especially the form/shape of AGI.
- Daniel Kokotajlo 8 Jan 2022 18:58 UTC
  1 point
  Parent
  I was wrong to link my birds brains planes post btw, you are right, it doesn’t really contradict what you are saying. As for the cheese analogy… I still think I’m right but I’ll rest my case.
  Like I said at the top, I really appreciate this post and learned a lot from it—I just think it draws some erroneous conclusions. It’s possible I’m just not understanding the argument though.
  For any particular energy budget there is a Landauer Limit imposed maximum net communication flow rate through the system and a direct tradeoff between clock speed and accessible memory size at that flow rate.
  Yes, and you go on to argue that the brain is operating about as fast as it could possibly operate given its tiny energy budget. But current and future computers will have much, much larger energy budgets. They can therefore operate much faster (and they do).
  Correct me if I’m wrong, but my impression is that currently paying for energy is less than 10% of the cost of compute. Most of the cost is the hardware itself, and maintaining the facilities. In light of that, it really does seem that we are not energy-constrained. Maybe in the future we will be, but for now, the cost of the energy is small compared to the cost of everything else that goes into training and running AI. So chip designers and AI designers are free to use high energy budgets if it gets other benefits like faster speed or cheaper manufacturing or whatever. If they are using high energy budgets, they don’t need to build chips to be more and more like the human brain, which has a low energy budget. In other words, they don’t need this:
  Achieving those levels of energy efficiency will probably require brain-like neuromorphic-ish hardware, circuits, and learned software via training/education. The future of AGI is to become more like the brain, not less.
  Nor need chip designers optimize towards maximal energy efficiency; energy efficiency is not top of the priority list for things to optimize for, since energy is only a small fraction of the cost:
  Likewise, DL evolving towards AGI converges on brain reverse engineering[60][61], especially when optimizing towards maximal energy efficiency for complex real world tasks.
  Meanwhile, this is off-base too:
  Why should we care? Brain efficiency matters a great deal for AGI timelines and takeoff speeds, as AGI is implicitly/explicitly defined in terms of brain parity. If the brain is about 6 OOM away from the practical physical limits of energy efficiency, then roughly speaking we should expect about 6 OOM of further Moore’s Law hardware improvement past the point of brain parity:
  AGI is not defined as hardware that performs computations as energy-efficiently as the brain. Instead, it is software that performs all important intellectual tasks as effectively as the brain, cost be damned. The goal of the field of AI is not to equal the brain in energy-efficiency, any more than the goal of powered flight is to produce machines as energy-efficient as birds.
  
  One possibility is that I’m misinterpreting your conclusion about how the future of AGI is to become more like the brain, not less. I interpreted that to mean that you were forecasting a rise in neuromorphic computing and/or forecasting that the biggest progress in AGI will come from people studying neuroscience to learn from the brain, and (given what you said about brain parity in the introduction) that you don’t think we’ll get AGI until we do those things and make it more like the brain. Do you think those things, or anything adjacent? If not, then maybe we don’t disagree after all. (Though then I wonder what you meant by “The future of AGI is to become more like the brain, not less.” And also it still seems like we have some sort of disagreement about the importance of energy efficiency more generally.)
  The question of when we’ll get almost-human level agents for $20,000/yr vs smart-human-level for $1,000/yr vs today where almost-human level costs unknown large amounts, perhaps $billions—is ultimately an energy efficiency constrained question. … Although again to reiterate as I said in the article, the principle blocker today for early AGI is knowledge, because GPUs are probably only a few OOM less energy efficient at the hardware level (our current net inefficiency is more on the algorithm/software side). But even that doesn’t make low level circuit energy efficiency irrelevant: it constrains takeoff speed and also especially the form/shape of AGI
  Why? Am I wrong that energy is <10% the cost of compute? How is energy efficiency a taut constraint then? Or are you merely saying that it is a constraint, not a taut one? Just as cheese efficiency is a constraint, but not a taut one?
  I of course agree that if we had the right knowledge, we could build AGI today and it would probably even run on my laptop. I think it doesn’t follow that the principle blocker today for early AGI is knowledge. There are lots of things X such that if we had X we could build AGI today. I think it’s only appropriate to label “the principle blocker” the one that is realistically most likely to be achieved first. And realistically I think we are more likely to get AGI by scaling up models and running them on massive supercomputers (for much more energy cost than the human brain uses!) than by achieving great new insights of knowledge such that we can run 1000 AGI on 1000 2021 GPUs. (However, on this point we can agree to disagree, it’s mostly a matter of intuition anyway.)
  - jacob_cannell 8 Jan 2022 21:57 UTC
    11 points
    Parent
    
    But current and future computers will have much, much larger energy budgets. They can therefore operate much faster (and they do).
    
    Faster clock speed but not faster thought speed., as they just burn all that speed inefficiently simulating a large circuit. Even though a single GPU has similar nominal flops compared to the brain and uses 30x more power, they have about 3 OOM less memory and memory bandwidth. GPUs are amazing at simulating insect brains at high speeds.
    
    But we want big brain-scale ANNs, as that is what intelligence requires. So you need 1000x GPUs in parallel with complex expensive high bandwidth interconnect to get a single brain-size ANN, at which point you also get 1000 instances (of the same mind). That only allows you to run it at brain speed, not any faster. You can’t then just run it on 1 million GPUs to get 1000x speedup—that’s not how it works at all. Instead you’d get 1 million instances of 1000 brain size ANNs. This ultimately relate to energy flow efficiency—see the section on circuits. Energy efficiency is a complex multi-dimensional engineering constraint set, it’s not a simple linear economic multiplier.
    
    Moore’s Law isn’t going to improve this scenario much—at least not for GPUs or any von neumman style architecture.
    
    Moore’s Law will eventually allow a very specific narrow class of designs to simultaneously achieve brain scale and high speedup, but that narrow class of designs is necessarily neuromorphic and similar to an artificial brain. Furthermore, economic pressure will naturally push the industry towards neuromorphic brain style AGI designs, as they will massively outcompete everything else.
    
    These are the engineering constraints from physics the article is attempting to elucidate.
    
    If they are using high energy budgets, they don’t need to build chips to be more and more like the human brain, which has a low energy budget. In other words, they don’t need this:
    
    Given the choice between a neuromorphic design which can run 1,000 instances of 1,000 unique agent minds at 100x the speed of human thought, or a von-neumman type design which can run 1,000 instances of only 1 agent mind at 1x the speed of human thought at the same prices—the latter is not competitive.
    
    AGI is not defined as hardware that performs computations as energy-efficiently as the brain. Instead, it is software that performs all important intellectual tasks as effectively as the brain, cost be damned.
    
    The cost/value of a human worker is like 0.1% energy equivalent, and mostly intangibles with a significant chunk being knowledge/software. AGI is only economically viable if it outcompetes humans, so that right there implies an energy constraint that it can’t be 10000x less energy efficient. This constraint is naturally much more stringent for robotic applications.
    
    Then of course the same principles apply when comparing neuromorphic vs von-neumann machines at the end of Moore’s Law. The former is fundamentally multiple OOM more energy efficient than the latter (and just or more circuit cost efficient), and thus can run multiple OOM faster at the same cost, so it obviously wins.
    
    One possibility is that I’m misinterpreting your conclusion about how the future of AGI is to become more like the brain, not less. I interpreted that to mean that you were forecasting a rise in neuromorphic computing and/or forecasting that the biggest progress in AGI will come from people studying neuroscience to learn from the brain, and (given what you said about brain parity in the introduction) that you don’t think we’ll get AGI until we do those things and make it more like the brain. Do you think those things, or anything adjacent?
    
    Early AGI is somewhat brain-like ANNS running on GPUs, later AGI is even more brain-like ANNs running on neuromorphic/PIM hardware. Hmm maybe I need to make those parts more clear?
    
    I of course agree that if we had the right knowledge, we could build AGI today and it would probably even run on my laptop.
    
    The article shows how this is probably impossible, just like it would be impossible for you to run the full Google search engine on your 2021 laptop.
    
    And realistically I think we are more likely to get AGI by scaling up models and running them on massive supercomputers (for much more energy cost than the human brain uses!) than by achieving great new insights of knowledge such that we can run 1000 AGI on 1000 2021 GPUs.
    
    Lol what do you think a modern supercomputer is, if not thousands of GPUs? There are scaling limits to parallelization, as mentioned. Or perhaps you are confused by the 1000 instance thing, but as I tried to explain: a single AGI instance is just as expensive as ~1000, at least on current non-neurmorphic hardware. (So you always get 1000-ish instances, see the circuit section)
    - Daniel Kokotajlo 9 Jan 2022 1:39 UTC
      1 point
      Parent
      I get the sense that we are talking past each other. I wonder if part of what’s happening here is that you have a broader notion of what counts as neuromorphic hardware and brain-like AI than I did, and are therefore making a much weaker claim than I thought you were. I can’t tell for sure but some of the things you’ve said recently make me think this.
      I know that modern supercomputers are thousands of GPUs. That isn’t in conflict with what I said. I understand that on current hardware anyone able to make 1 AGI will be able to easily make many, for the reasons you mentioned.
      I’m not sure what you meant by the claims I objected to, so I’ll stop trying to argue against them. I do still stand by what I said about how energy is not currently a taut constraint, and your post sure did give the impression that you thought it was. Or maybe you were just saying you think it will eventually become one?
      - jacob_cannell 9 Jan 2022 15:53 UTC
        7 points
        Parent
        I provided some links to neuromorphic hardware research, and I sometimes lump it in with PIM (Processor in Memory) architecture. It’s an architecture where memory and compute are unified with some artificial synapse like thing—eg memristors. It’s necessarily brain-like, as the thing it’s really good at it is running (low precision) ANNs efficiently.
        
        The end of Moore’s Law is actually a series of incomplete barriers, each of which only allows an increasingly narrower computational design to scale past that barrier: dennard scaling blocked serial architectures (CPUs), next up the end of energy scaling will block von-neumman arch (GPUs/TPUs), allowing only neuromorphic/PIM to scale much further, then there is the final size scaling barrier for all reversible computation, and only exotic reversible/quantum computers scale past that.
        
        Your comment about your laptop running AGI suggested you had a different model for the min hardware requirements in terms of RAM, RAM bandwidth, and flops.
        Daniel Kokotajlo 9 Jan 2022 17:44 UTC
        2 points
        Parent
        Great. Thanks.
        The end of Moore’s Law is actually a series of incomplete barriers, each of which only allows an increasingly narrower computational design to scale past that barrier: dennard scaling blocked serial architectures (CPUs), next up the end of energy scaling will block von-neumman arch (GPUs/TPUs), allowing only neuromorphic/PIM to scale much further, then there is the final size scaling barrier for all reversible computation, and only exotic reversible/quantum computers scale past that.
        Funnily enough, if this paragraph had appeared in the original text by way of explanation for what you meant by “The future of AGI is to become more like the brain, not less.” then I would not have objected. Sorry for the misunderstanding. I do still think we have some sort of disagreement about takeoff and timelines modelling, but maybe we don’t.
        I shouldn’t have said laptop; I should have said whatever it was you said (GPUs etc.) I happen to also believe it could in principle be done on a laptop with the right knowledge (imagine God himself wrote the code) but I shouldn’t have opened that can of worms. I agree that for all practical purposes it may as well be impossible.
        Yonadav Shavit 11 Jan 2022 23:54 UTC
        3 points
        Parent
        If I had to guess at the crux between your disagreement on timelines, I think you might disagree about the FOOM process itself, but not about energy as a taut constraint to the first human-level AGI (which you both seem to agree isn’t the case). Per Jacob’s model, if a FOOM requires the AGI to quickly become much much smarter than humans, that excess smartness will inherently come with a massive electrical cost, which will cap it out at O(10^9) human-brain-equivalents until it can substantially increase world energy output. This would serve to arrest FOOM at roughly human-civilization-scale collective intelligence, except with much better coordination abilities.
        To me, this was a pretty significant update, as I was previously imagining FOOMing to not top out before it was way way past human civilization’s collective bio-compute.
        Daniel Kokotajlo 12 Jan 2022 1:02 UTC
        2 points
        Parent
        What do you mean by “arrest FOOM?” I am quite confident that by the time the intelligence explosion starts winding down, it’ll be past the point of no return for humans. Maybe from the AIs perspective progress will have stagnated due to compute constraints, and further progress will happen only once they can design exotic new hardware, so subjectively it feels like aeons of stagnation. But I think that “human-civilization-scale collective intelligence except with much better coordination abilities” is vastly underselling it. It’s like saying caveman humans were roughly elephant-scale intelligences except with better coordination abilities. Or saying that SpaceX is “roughly equivalent to the average US high school 9th grade class, except with better coordination abilities.” Do you disagree with this?
        jacob_cannell 12 Jan 2022 15:52 UTC
        4 points
        Parent
        I’m not entirely sure what you mean by “better coordination abilities”, but the primary difference between 9th graders and SpaceX employees is knowledge/training. The primary difference between elephants and caveman humans was the latter possessing language and thus technology/culture and beyond single-lifetime knowledge accumulation.
        
        AGI instances of the same shared mind/model should obviously have a coordination advantage, as should those created by the same organization, but there are many organizations that may be creating AGI.
        
        Even in a world where AGI is running on GPUs and is scale-out bound by energy use and fab output, it may be that a smaller number of larger-than-human minds trained on beyond-human experience have a strong advantage, and in general I’d expect those types of advantages to matter at least as much as ‘coordination abilities’.
        Daniel Kokotajlo 12 Jan 2022 17:20 UTC
        2 points
        Parent
        I don’t have a precise definition in mind since I was parrotting Yonadav. My point was that SpaceX is way better than a random similarly-sized group of high schoolers in many many important ways, even though SpaceX consumes just as many calories/energy as the high schoolers, such that it’s totally misleading to describe them as “roughly equivalent except that SpaceX has a massive coordination advantage.” The only thing roughly equivalent about them is their energy consumption, which just goes to show energy consumption is not a useful metric here.
        I totally agree that fewer, larger brains with experience advantages seem likely to outcompete many merely human-sized brains. In fact I think I agree with everything you said in this comment.
        Yonadav Shavit 12 Jan 2022 13:48 UTC
        3 points
        Parent
        No, I agree that coordination is the ballgame, and there’s not huge practical difference there. Entirely separately, if we were worried about a treacherous turn due to a system being way above our capabilities, this lowers the probability of that, because there are clearer signals associated with the increase in intelligence needed to out-scheme a team of careful humans. (Large compute + energy usage.) It’s not close to being a solution, but it does bound the tail of arbitrarily-pessimistic outcomes from small-scale projects suddenly FOOMing. It also introduces an additional moniterable real world effect (a spike in energy usage noticeable by energy regulation systems).
        Daniel Kokotajlo 12 Jan 2022 17:16 UTC
        2 points
        Parent
        Ah, I see. I think 10^9 is not a meaningful number to be talking about; long before there are 10^9 brain-equivalents worth of compute going into AI, we’ll be past the point of no return. But if instead you are talking about an amount of compute large enough that energy companies should be able to detect it, then yeah this seems fairly plausible. Supercomputers can’t be hidden from energy companies as far as I know, and plausibly AGI will appear first in supercomputers, so plausibly wherever AGI appears, it’ll be known by some government that the project was underweigh at least.
        I don’t think this meaningfully lowers the probability of treacherous turn due to a system being way above our capabilities though. That’s because I didn’t put much probability mass on secret-AGI-project-in-a-basement scenarios anyway. I guess if I had, then this would have updated me.
        the gears to ascension 12 Jan 2022 20:17 UTC
        1 point
        Parent
        Crypto mining would be affected significantly as well, or potentially mostly instead of, total energy use: intelligence is valuable-computation-per-watt, changing v-c-p-w changes the valuable energy spend of computers that sit idle, so you’d expect projects bidding on this to overtake cryptocurrency mining as the best use of idle computers, whether that’s due to a single project buying up computers and power, or due to a cryptocurrency energy-wasting-farm suddenly finding something directly valuable to do with their machines (and in fact it is already the case that ML can pay more than crypto mining).
        
        @jacob_cannell’s argument is simply that the brain has more to tell us about the structure of high-value-per-watt computation than expected by ai philosophers. It does not mean the brain is at the absolute limit of generalized algorithmic energy efficiency (aka the only possible generalized intelligence metric); it only means that the structure of physical limits on algorithmic energy efficiency must be obeyed by any intelligent system, and while there may be large asymptotic speedups from larger scale structure improvement, the local efficiency of the brain is nothing to shake a stick at.
        
        Perhaps ASI could be done earlier by “wasting” energy on lower value-per-watt AI projects—and in fact, there’s no reason to believe otherwise from available research progress. All AI progress that has ever occurred, after all, has been on lower generalized value-per-watt compute substrate than human brains can provide, but in return for being on thermodynamically inefficiency computers, it gets benefits that can economically compete with humans—eg via algorithmic specialization, high precision math, or exact repeatability—and thereby, AI research makes progress towards ever-increasing value-of-compute-output-per-watt.
        
        If a system is AGI, it means that it is within a constant factor of energy efficiency per watt of the human brain for nearly all tasks—potentially a large constant factor, but a constant factor nonetheless. If it’s just barely general superintelligence and is wildly inefficient at small scales, then the only possible way it could be superintelligence is because it scales (maybe just barely) better than the brain with problem difficulty—extracting asymptotically better value-per-watt than an equivalently scaled system of humans consuming the same number of watts, due to what must ground out to improved total-system-thermodynamic-efficiency-per-unit-useful-computation.
        
        Your proposal seems to be that we should expect a large scale multi-agent AI system to be superintelligence in this larger-scale asymptotic respect, despite that the human brain has shockingly high interconnect-efficiency and basic thermal compute efficiency. I have no disagreement. What this does tell us is that deep learning doesn’t have a unique expected qualitative advantage nor expected qualitative disadvantage vs the brain. if it becomes able to find more energy-efficient energy routes through its processing substrate’s spacetime (ie more energy efficient algorithms) (ie more intelligent algorithms), then it wins. predicting when that will happen, which teams are close, and guaranteeing safety becomes the remaining issue: guaranteeing that the resulting system does not cause mass energy-structure-aka-data loss (eg, death, body damage, injury, memory loss, hdd corruption/erasure, failure to cryonically freeze as-yet-unrepairable beings, etc) nor interfere significantly with the values of living beings (torture, energy-budget squeeze, cryonic freezing of beings who wish to continue operating, etc).
        
        (due to the cycles seen in evolutionary game theory, I suspect that an unsafe or bad-at-distributed-systems-fairness AGI mega-network will moderately quickly collapse with similar high-defection-rate issues to the human society we have; and if it exterminates and then succeeds humanity, I’d guess it will eventually evolve a large scale cooperative system again; but there’s no reason to believe it wouldn’t kill us first. friendly multi-agent systems are the hardest part of this whole thing, IMO.)