the brain is already near optimal in terms of what can be done for 10 watts with any irreversible computer (this is relatively easy to show from wiring energy analysis).
I should probably rephrase the brain optimality argument, as it isn’t just about energy per se. The brain is on the pareto efficiency surface—it is optimal with respect to some complex tradeoffs between area/volume, energy, and speed/latency.
Energy is pretty dominant, so it’s much closer to those limits than the rest. The typical futurist understanding about the Landauer limit is not even wrong—way off, as I point out in my earlier reply below and related links.
A consequence of the brain being near optimal for energy of computation for intelligence given it’s structure is that it is also near optimal in terms of intelligence per switching events.
The brain computes with just around 10^14 switching events per second (10^14 synapses * 1 hz average firing rate). That is something of an upper bound for the average firing rate.1
The typical synapse is very small, has a low SNR and thus is equivalent to a low bit op, and only activates maybe 25% of the time.2 We can roughly compare these minimal SNR analog ops with the high precision single bit ops that digital transistors implement. The landauer principle allows us to rate them as reasonably equivalent in computational power.
So the brain computes with just 10^14 switching events per second. That is essentially miraculous. A modern GPU uses perhaps 10^18 switching events per second.
So the important thing here is not just energy—but overall circuit efficiency. The brain is crazy super efficient—and as far as we can tell near optimal—in its use of computation towards intelligence.
This explains why our best SOTA techniques in almost all AI are some version of brain-like ANNs (the key defining principle being search/optimization over circuit space). It predicts that the best we can do for AGI is to reverse engineer the brain. Yes eventually we will scale far beyond the brain, but that doesn’t mean that we will use radically different algorithms.
A consequence of the brain being near optimal for energy of computation for intelligence given its structure is that it is also near optimal in terms of intelligence per switching events.
So the brain computes with just 10^14 switching events per second.
What do you mean by, given its structure? Does this still leave open that a brain with some differences in organization could get more intelligence out of the same number of switching events per second?
Similarly, I assume the same argument applies to all animal brains. Do you happen to have stats on the number of switching events per second for e.g. the chimpanzee?
EDIT: see this comment and this comment on reddit for some references on circuit efficiency.
Computers are circuits and thus networks/graphs. For primitive devices the switches (nodes) are huge so they use up significant energy. For advanced devices the switches are not much larger than wires, and the wire energy dominates. If you look at the cross section of a modern chip, it contains a hierarchy of metal layers of decreasing wire size, with the transistors at the bottom. The side view section of the cortex looks similar with vasculature and long distance wiring taking the place of the upper meta layers.
The vast majority of the volume in both modern digital circuits and brain circuits consists of wiring. The transistors and the synapses are just tiny little things in comparison.
Modern computer mem systems have a wire energy eff of around 10^-12 to 10^-13 J/bit/mm. The limit for reliable signals is perhaps only 10x better. I think the absolute limit for unreliable bits is 10^-15 or so, will check citation for that when I get home. Wire energy eff for bandwidth is not improving at all and hasn’t since the 90′s. The next big innovation is simply moving the memory closer , that’s about all we can do.
The min wire energy is close to that predicted by a simple model of a molecular wire where each molecule sized 1 nm section is a switch (10^-19 to 10^-21 * 10^6 = 10^-13 to 10^-15). In reality of course it’s somewhat more complex—smaller wires actually dissipate more energy, but also require less to represent a signal.
Also keep in mind that synapses are analog devices which require analog impulse inputs and outputs—they do more work than a single binary switch.
So moores law is ending and we are already pretty close to the limits of wire efficiency. If you add up the wiring paths in the brain you get a similar estimate. Axons/dendrites appear to be at least as efficient as digital wires and are thus near optimal. None of this should be surprising—biological cells are energy optimal true nanocomputers. Neural circuits evolved from the bottom up—there was never a time at which they were inefficient.
However, it is possible to avoid wire dissipation entirely with some reversible signal path. Optics is one route but photons and thus photonic devices are impractically large. The other option is superconducting circuits, which work in labs but also have far too many disadvantages to be practical yet. Eventually cold superconducting reversible computers could bypass energy issues, but that tech appears to be far.
The other option is superconducting circuits, which work in labs but also have to many disadvantages to be practical yet. Eventually cold superconducting reversible computers could bypass energy issues, but that tech appears to be far.
What about just replacing the copper wire inside a conventional CMOS chip with a superductor? It took some searching, but I managed to find a paper titled Cryogenically Cooled CMOS which talks about the benefits and feasibility of doing this. Quoting from the relevant section:
If lower interconnect resistance improves performance, the use of ‘zero-resistance’ superconductors should provide the
ultimate in performance improvement. Unfortunately, although performance improvements would be expected, they would not
be as great as the simplistic statement above
suggests. Furthermore, several technical
obstacles remain before high-temperature
superconductors (HTS) can be effectively
integrated with VLSI technology.
Actually, as we will see, the resistance
of superconducting films is not truly zero,
except in the limits of zero frequency or
zero temperature. Nevertheless, at 77 K
and 1 GHz, measurements on patterned
YBa2Cu3O7-x (YBCO) films have already
demonstrated surface resistances one to two
orders of magnitude below those for Cu
under the same conditions. Theoretical
predictions for YBCO suggest four orders
of magnitude would be possible. Unfortunately, good-quality (epitaxial) YBCO films
grow best on perovskite substrates having
high dielectric constants. Lanthanum aluminate (LaAlO3), which is a popular substrate for HTS microwave circuits, has a
relative dielectric constant of 25. Assuming
the same interconnect geometry, this makes
all capacitances more than 6× greater than
would be the case for a SiO2 dielectric.
Thus, some of the low-resistance benefits
of HTS films are cancelled by the high dielectric constants of their associated substrates.
So it looks like there’s no fundamental reason why it couldn’t be done, just a matter of finding the right substrate material and solving other engineering problems.
What about just replacing the copper wire inside a conventional CMOS chip with a superductor?
That is the type of tech I was referring to by superconducting circuits as precursor to full reversible. From what I understand, if you chill everything down then you also change resistance in the semiconductor along with all the other properties, so it probably isn’t as easy as just replacing the copper wires.
A room temperature superconductor circuit breakthrough is one of the main wild cards over the next decade or so. Cryogenic cooling is pretty impractical for mainstream computing.
So it looks like there’s no fundamental reason why it couldn’t be done, just a matter of finding the right substrate material and solving other engineering problems.
Yeah, its just a question of timetables. If it’s decades away, we have a longer period of stalled moore’s law during which AGI will slowly surpass the brain, rather than rapidly.
From what I understand, if you chill everything down then you also change resistance in the semiconductor along with all the other properties, so it probably isn’t as easy as just replacing the copper wires.
From the sources I’ve read, there aren’t any major issues running CMOS at 77 K, you only run into problems at lower temperatures, less than 40 K. I guess people aren’t seriously trying this because it’s probably not much harder to go directly to full superconducting computers (i.e., with logic gates made out of superconductors as well) which offers a lot more benefits. Here is an article about a major IARPA project pursuing that. It doesn’t seem safe to assume that we’ll get AGI before we get superconducting computers. Do you disagree, if so can you explain why?
There was similar interest in superconducting chips about a decade ago which was pretty much the same story—DARPA/IARPA spearheading research, major customer would be US intelligence.
The 500 gigaflops per watt figure is about 100 times more computation/watt than on a current GPU—which is useful because it shows that about 99% of GPU energy cost is interconnect/wiring.
In terms of viability and impact, it is still uncertain how much funding superconducting circuits will require to become competitive. And even if it is competitive in some markets for say the NSA, that doesn’t make it competitive for general consumer markets. Cryogenic cooling means these things will only work in very special data rooms—so the market is more niche.
The bigger issue though is total cost competitiveness. GPUs are sort of balanced in that the energy cost is about half of the TCO (total cost of ownership). It is extremely unlikely that superconducting chips will be competitive in total cost of computation in the near future. All the various tradeoffs in a superconducting design and the overall newness of the tech imply lower circuit densities. Smaller market implies less research amortization and higher costs. Even if a superconducting chip used 0 energy, it will still be much more expensive and provide less ops/$.
Once we run out of scope for further CPU/GPU improvements over the next decade, then the TCO budget will shift increasingly towards energy, and these types of chips will become increasing viable. So I’d estimate that the probability of impact in the next 5 years is small, but 10 years or more out it’s harder to say. To make a more viable forecast I’d need to read more on this tech and understand more about the costs of cryogenic cooling.
But really roughly—the net effect of this could be to add another leg to moore’s law style growth, at least for server computation.
I guess people aren’t seriously trying this because it’s probably not much harder to go directly to full superconducting computers (i.e., with logic gates made out of superconductors as well) which offers a lot more benefits
It takes energy to maintain cryogenic temperatures, probably much more than the energy that would be saved by eliminating wire resistance. If I understand correctly, the interest in superconducting circuits is mostly in using them to implement quantum computation. Barring room temperature superconductors, there are probably no benefits of using superconducting circuits for classical computation.
Studies indicate the technology, which uses low temperatures in the 4-10 kelvin range to enable information to be transmitted with minimal energy loss, could yield one-petaflop systems that use just 25 kW and 100 petaflop systems that operate at 200 kW, including the cryogenic cooler. Compare this to the current greenest system, the L-CSC supercomputer from the GSI Helmholtz Center, which achieved 5.27 gigaflops-per-watt on the most-recent Green500 list. If scaled linearly to an exaflop supercomputing system, it would consume about 190 megawatts (MW), still quite a bit short of DARPA targets, which range from 20MW to 67MW.
ETA: 100 petaflops per 200 kW equals 500 gigaflops per watt, so it’s estimated to be about 100 times more energy efficient.
As the efficiency of a logically irreversible computer approaches the Landauer limit, its speed must approach zero, for the same reason why as the efficiency of a heat engine approaches the Carnot limit its speed must approach zero.
I don’t have an equation at hand, but I wouldn’t be surprised if it turned out that biological neurons operate close to the physical limit for their speed.
EDIT:
I found this Physics Stack Exchange answer about the thermodynamic efficiency of human muscles.
Hmm… after more searching, I found this page, which says:
The faster the processor runs, the larger the energy required to maintain the bit in the predefined 1 or 0 state. You can spend a lot of time arguing about a sensible value but something like the following is not too unreasonable: The Landauer switching limit at finite (GHz) clock speed:
Energy to switch 1 bit > 100 k_B T ln(2)
So biological neurons still don’t seem to be near the physical limit since they fire at only around 100 hz and according to my previous link dissipates millions to billions times more than k_B T ln(2).
A 100kT signal Is only reliable for a distance of a few nanometers. The energy cost is all in pushing signals through wires. So the synapse signal is a million times larger than 100kT to cross a distance of around 1 mm or so, which works out to 10^-13 J per synaptic event. Thus 10 watts for 10^14 synapses and a 1 hz rate. For a 100 hz rate, the average dist would need to be less.
Not my field of expertise, but I don’t understand where this bound comes form. In this paper for short erasure cycles they find an exponential law, although they don’t give the constants (I suppose they are system-dependent).
Do you have a citation for this? My understanding is that biological neural networks operate far from the Landauer Limit (sorry I couldn’t find a better citation but this seems to be a common understanding), whereas we already have proposals for hardware that is near that limit.
I should probably rephrase the brain optimality argument, as it isn’t just about energy per se. The brain is on the pareto efficiency surface—it is optimal with respect to some complex tradeoffs between area/volume, energy, and speed/latency.
Energy is pretty dominant, so it’s much closer to those limits than the rest. The typical futurist understanding about the Landauer limit is not even wrong—way off, as I point out in my earlier reply below and related links.
A consequence of the brain being near optimal for energy of computation for intelligence given it’s structure is that it is also near optimal in terms of intelligence per switching events.
The brain computes with just around 10^14 switching events per second (10^14 synapses * 1 hz average firing rate). That is something of an upper bound for the average firing rate.1
The typical synapse is very small, has a low SNR and thus is equivalent to a low bit op, and only activates maybe 25% of the time.2 We can roughly compare these minimal SNR analog ops with the high precision single bit ops that digital transistors implement. The landauer principle allows us to rate them as reasonably equivalent in computational power.
So the brain computes with just 10^14 switching events per second. That is essentially miraculous. A modern GPU uses perhaps 10^18 switching events per second.
So the important thing here is not just energy—but overall circuit efficiency. The brain is crazy super efficient—and as far as we can tell near optimal—in its use of computation towards intelligence.
This explains why our best SOTA techniques in almost all AI are some version of brain-like ANNs (the key defining principle being search/optimization over circuit space). It predicts that the best we can do for AGI is to reverse engineer the brain. Yes eventually we will scale far beyond the brain, but that doesn’t mean that we will use radically different algorithms.
What do you mean by, given its structure? Does this still leave open that a brain with some differences in organization could get more intelligence out of the same number of switching events per second?
Similarly, I assume the same argument applies to all animal brains. Do you happen to have stats on the number of switching events per second for e.g. the chimpanzee?
EDIT: see this comment and this comment on reddit for some references on circuit efficiency.
Computers are circuits and thus networks/graphs. For primitive devices the switches (nodes) are huge so they use up significant energy. For advanced devices the switches are not much larger than wires, and the wire energy dominates. If you look at the cross section of a modern chip, it contains a hierarchy of metal layers of decreasing wire size, with the transistors at the bottom. The side view section of the cortex looks similar with vasculature and long distance wiring taking the place of the upper meta layers.
The vast majority of the volume in both modern digital circuits and brain circuits consists of wiring. The transistors and the synapses are just tiny little things in comparison.
Modern computer mem systems have a wire energy eff of around 10^-12 to 10^-13 J/bit/mm. The limit for reliable signals is perhaps only 10x better. I think the absolute limit for unreliable bits is 10^-15 or so, will check citation for that when I get home. Wire energy eff for bandwidth is not improving at all and hasn’t since the 90′s. The next big innovation is simply moving the memory closer , that’s about all we can do.
The min wire energy is close to that predicted by a simple model of a molecular wire where each molecule sized 1 nm section is a switch (10^-19 to 10^-21 * 10^6 = 10^-13 to 10^-15). In reality of course it’s somewhat more complex—smaller wires actually dissipate more energy, but also require less to represent a signal.
Also keep in mind that synapses are analog devices which require analog impulse inputs and outputs—they do more work than a single binary switch.
So moores law is ending and we are already pretty close to the limits of wire efficiency. If you add up the wiring paths in the brain you get a similar estimate. Axons/dendrites appear to be at least as efficient as digital wires and are thus near optimal. None of this should be surprising—biological cells are energy optimal true nanocomputers. Neural circuits evolved from the bottom up—there was never a time at which they were inefficient.
However, it is possible to avoid wire dissipation entirely with some reversible signal path. Optics is one route but photons and thus photonic devices are impractically large. The other option is superconducting circuits, which work in labs but also have far too many disadvantages to be practical yet. Eventually cold superconducting reversible computers could bypass energy issues, but that tech appears to be far.
What about just replacing the copper wire inside a conventional CMOS chip with a superductor? It took some searching, but I managed to find a paper titled Cryogenically Cooled CMOS which talks about the benefits and feasibility of doing this. Quoting from the relevant section:
So it looks like there’s no fundamental reason why it couldn’t be done, just a matter of finding the right substrate material and solving other engineering problems.
That is the type of tech I was referring to by superconducting circuits as precursor to full reversible. From what I understand, if you chill everything down then you also change resistance in the semiconductor along with all the other properties, so it probably isn’t as easy as just replacing the copper wires.
A room temperature superconductor circuit breakthrough is one of the main wild cards over the next decade or so. Cryogenic cooling is pretty impractical for mainstream computing.
Yeah, its just a question of timetables. If it’s decades away, we have a longer period of stalled moore’s law during which AGI will slowly surpass the brain, rather than rapidly.
From the sources I’ve read, there aren’t any major issues running CMOS at 77 K, you only run into problems at lower temperatures, less than 40 K. I guess people aren’t seriously trying this because it’s probably not much harder to go directly to full superconducting computers (i.e., with logic gates made out of superconductors as well) which offers a lot more benefits. Here is an article about a major IARPA project pursuing that. It doesn’t seem safe to assume that we’ll get AGI before we get superconducting computers. Do you disagree, if so can you explain why?
There was similar interest in superconducting chips about a decade ago which was pretty much the same story—DARPA/IARPA spearheading research, major customer would be US intelligence.
The 500 gigaflops per watt figure is about 100 times more computation/watt than on a current GPU—which is useful because it shows that about 99% of GPU energy cost is interconnect/wiring.
In terms of viability and impact, it is still uncertain how much funding superconducting circuits will require to become competitive. And even if it is competitive in some markets for say the NSA, that doesn’t make it competitive for general consumer markets. Cryogenic cooling means these things will only work in very special data rooms—so the market is more niche.
The bigger issue though is total cost competitiveness. GPUs are sort of balanced in that the energy cost is about half of the TCO (total cost of ownership). It is extremely unlikely that superconducting chips will be competitive in total cost of computation in the near future. All the various tradeoffs in a superconducting design and the overall newness of the tech imply lower circuit densities. Smaller market implies less research amortization and higher costs. Even if a superconducting chip used 0 energy, it will still be much more expensive and provide less ops/$.
Once we run out of scope for further CPU/GPU improvements over the next decade, then the TCO budget will shift increasingly towards energy, and these types of chips will become increasing viable. So I’d estimate that the probability of impact in the next 5 years is small, but 10 years or more out it’s harder to say. To make a more viable forecast I’d need to read more on this tech and understand more about the costs of cryogenic cooling.
But really roughly—the net effect of this could be to add another leg to moore’s law style growth, at least for server computation.
It takes energy to maintain cryogenic temperatures, probably much more than the energy that would be saved by eliminating wire resistance. If I understand correctly, the interest in superconducting circuits is mostly in using them to implement quantum computation.
Barring room temperature superconductors, there are probably no benefits of using superconducting circuits for classical computation.
From the article I linked to:
ETA: 100 petaflops per 200 kW equals 500 gigaflops per watt, so it’s estimated to be about 100 times more energy efficient.
Ok, I guess it depends on how big your computer is, due to the square-cube law. Bigger computers would be at an advantage.
As the efficiency of a logically irreversible computer approaches the Landauer limit, its speed must approach zero, for the same reason why as the efficiency of a heat engine approaches the Carnot limit its speed must approach zero.
I don’t have an equation at hand, but I wouldn’t be surprised if it turned out that biological neurons operate close to the physical limit for their speed.
EDIT:
I found this Physics Stack Exchange answer about the thermodynamic efficiency of human muscles.
Hmm… after more searching, I found this page, which says:
So biological neurons still don’t seem to be near the physical limit since they fire at only around 100 hz and according to my previous link dissipates millions to billions times more than k_B T ln(2).
A 100kT signal Is only reliable for a distance of a few nanometers. The energy cost is all in pushing signals through wires. So the synapse signal is a million times larger than 100kT to cross a distance of around 1 mm or so, which works out to 10^-13 J per synaptic event. Thus 10 watts for 10^14 synapses and a 1 hz rate. For a 100 hz rate, the average dist would need to be less.
Not my field of expertise, but I don’t understand where this bound comes form. In this paper for short erasure cycles they find an exponential law, although they don’t give the constants (I suppose they are system-dependent).