Yes, you need some kind of switch for any mechanical computer. My point was that you need multiple mechanical “amplifiers” for each single positioner arm, the energy usage of that would be substantial, and if you have a binary mechanical switch controlling a relatively large movement, then the thermal noise will put it in an intermediate state a lot of the time so the arm position will be off.
That’s not how computers (the ones we have today or the rod logic ones proposed work). Each rod or wire represents a single on/off bit.
Yes, doing mechanosynthesis is more complicated and precise sub nm control of a tooltip may not be competitive with biology for self replication. But if the AI wants a substrate to think on that can implement lots of FLOPs then molecular rod logic will work.
For that matter protein based mechanical or hybrid electromechanical computers are plausible. Likely with lower energy consumption per erased bit than neurons and certainly with more density. Human computers have nm sized transistors. There’s no reason to think that neurons and synapses are the most efficient sort of biological computer.
There’s no reason to think that neurons and synapses are the most efficient sort of biological computer.
Bio-neuron based brains are extremely efficient, and close to pareto-optimal. We are near the end of moore’s law and the viable open routes for forward progress in energy efficiency are essentially neuromorphic.
That post makes a fundamental error about wiring energy efficiency by ignoring the 8 OOM difference in electrical conductivity between neuron saltwater and copper. (0.5 S vs 50 MS)
There’s almost certainly a factor of 100 energy efficiency gains to be had by switching from saltwater to copper in the brain and reducing capacitance by thinning the wires. I’ll be leaving a comment soon but that had to be said.
energy/bit/(linear distance) agreement points to underlying principle of “if you’ve thinned the wires why haven’t you packed everything in tighter” leading to similar capacitance and therefore energy values/(linear distance)
face to face die stacking results suggest that computers could be much more efficient if they weren’t limited to 2d packing of logic elements. A second logic layer more than halved power consumption at the same performance and that’s with limited interconnect density between the two logic dies.
The Cu<-->saltwater conductivity difference leads to better utilisation of wiring capacitance to reduce thermal noise voltage at transistor gates. Concretely, there are more electrons able to effectively vote on the output voltage. For very short interconnects this matters less but long distance or high fanout nodes have lots of capacitance and low resistance wires make the voltage much more stable.
Electrical conduction through “Neuron saltwater” is not how neuronal interconnect works, its electrochemical. You are simply mistaken, as copper interconnect wire energy limits and neuron wire energy efficiency limits are essentially the same and both approach the theoretical landauer minimum as explained in the article.
The Landauer limit puts the energy cost to erase a bit at about 0.02eV at room temperature. For comparison, the energy in a single photon of visible light is about 1eV. Already we can see that the brain is not going to get anywhere close to this. 1eV is a molecular energy scale, not a cellular one.
The brain requires about 20 Watts of power. Running this directly through the Landauer limit, we get 10^21 bits erased per second. For comparison, the number of synapses is about 2*10^14 (pulled from jacob_cannell’s post linked above) and this gives about 600kB of data erased per synapse per second. This is not a reasonable number! It’s justified in the post by assuming that we’re banned from using regular digital logic to implement binary arithmetic and are instead forced into using heaps of “counters” where the size of the heap is the number you’re representing, and this comes along with shot noise, of course.
The section on “interconnect” similarly assumes that we’re forced to dissipate a certain amount of energy per bit transferred per unit length of interconnection. We’re banned from using superconducting interconnect, or any other creative solution here. Also, if we could shrink everything, the required length of interconnect would be shorter, but the post just does the calculation for things being normal brain size.
I’d further argue that, even if interconnect requirements are as a matter of engineering practicality close the the limits of what we can build, we should not confuse that with being “close to the thermodynamic limits”. Moving a bit from here to there should have no thermodynamic cost, and if we can’t manage it except by dissipating a huge amount of energy, then that’s a fact about our engineering skills, not a fact about the amount of computation the brain is doing.
In short, if you assume that you have to do things the way the brain does them, then the brain is somewhat close to “thermodynamic limits”, but without those assumptions it’s nowhere near the actual Landauer limit.
The Landauer limit puts the energy cost to erase a bit at about 0.02eV at room temperature.
No it does not—that is one of many common layman misunderstandings, which the article corrects. The practical Landauer limit (for fast reliable erasures) is closer to 1eV.
It’s justified in the post by assuming that we’re banned from using regular digital logic to implement binary arithmetic and are instead forced into using heaps of “counters”
Digital multipliers use similar or more energy for low precision multiply but are far larger, as discussed in the article with numerous links to research literature. (And most upcoming advanced designs for approaching brain energy efficiency use analog multipliers—as in memristor crossbar designs).
The section on “interconnect” similarly assumes that we’re forced to dissipate a certain amount of energy per bit transferred per unit length of interconnection.
That is indeed how conventional computing works.
Also, if we could shrink everything, the required length of interconnect would be shorter, but the post just does the calculation for things being normal brain size.
You obviously didn’t read the post as indeed it discusses this - see the section on size and temperature.
Moving a bit from here to there should have no thermodynamic cost, and if we can’t manage it except by dissipating a huge amount of energy, then that’s a fact about our engineering skills,
As discussed in the post—you absolutely can move bits without dissipating much energy using reversible interconnect (ie optics), but this does not come without enormous fundamental disadvantages in size.
No it does not—that is one of many common layman misunderstandings, which the article corrects. > The practical Landauer limit (for fast reliable erasures) is closer to 1eV.
So this is how the 1eV value is derived, right? Start with a bit that we want to erase. Set things up so there’s an energy gap of ΔE between the 0 state and the 1 state. Then couple to the environment, and wait for some length of time, so the probability that the bit has a value of 0 becomes:
11+e−βΔE
This is the probability of successful erasure, and if we want to get a really high probability, we need to set ΔE=50kT or something like that.
But instead imagine that we’re trying to erase 100 bits all at once. Now we set things up so that the 2100−1 bit strings that aren’t all zeros have an energy of ΔE and the all-zeros bit string has an energy of 0. Now if we couple to the environment, we get the following probability of successful erasure of all the bits:
11+(2100−1)e−βΔE
This is approximately equal to:
11+2100e−βΔE=11+e100log2−βΔE
Now, to make the probability of successful erasure really high, we can pick:
ΔE=50kT+100(kTlog2)
The 100(kTlog2) is there to cancel the 100log2 in the exponent. This is just the familiar Landauer limit. And the 50kT is there to make sure that we get the same level of reliability as before. But now that 50kT is amortized over 100 bits, so the extra reliability cost per bit is much less. So if I’m not wrong, the theoretical limit per bit should still be kTlog2.
The article has links to the 3 good sources (Landauer, Zhirnov, Frank) for this derivation. I don’t have time to analyze your math in detail but I suspect you are starting with the wrong setup—you need a minimal energy well to represent a bit stably against noise at all, and you pay that price for each bit, otherwise it isn’t actually a bit.
My prior that you find an error in the physics lit here is extremely low—this is pretty well established at this point.
I’ve taken a look at Michael P. Frank’s paper and it doesn’t seem like I’ve found an error in the physics lit. Also, I still 100% endorse my comment above: The physics is correct.
So your priors check out, but how can both be true?
you need a minimal energy well to represent a bit stably against noise at all, and you pay that price for each bit, otherwise it isn’t actually a bit.
To use the terminology in Frank, this is Esig you’re talking about. My analysis above applies to Ediss. Now in section 2 of Frank’s paper, he says:
With this particular mechanism, we see that Ediss=Esig; later, we will see that
in other mechanisms, Ediss can be made much less than Esig.
The formula kTlogr shows up in section 2, before Frank moves on to talking about reversible computing. In section 3, he gives adiabatic switching as an example of a case where Ediss can be made much smaller than Esig. (Though other mechanisms are also possible.) About midway through section 4, Frank uses the standard kTlog2 value, since he’s no longer discussing the restricted case where Ediss=Esig.
You obviously didn’t read the post as indeed it discusses this—see the section on size and temperature.
That point (compute energy/system surface area) assumes we can’t drop clock speed. If cooling was the binding constraint, drop clock speed and now we can reap gains in eficiency from miniaturization.
Heat dissipation scales linearly with size for a constant ΔT. Shrink a device by a factor of ten and the driving thermal gradient increases in steepness by ten while the cross sectional area of the material conducting that heat goes down by 100x. So if thermals are the constraint, then scaling linear dimensions down by 10x requires reducing power by 10x or switching to some exotic cooling solution (which may be limited in improvement OOMs achievable).
But if we assume constant energy per bit*(linear distance), reducing wire length by 10x cuts power consumption by 10x. Only if you want to increase clock speed by 10x (since propagation velocity is unchanged and signal travel less distance). Does power go back up. In fact wire thinning to reduce propagation speed gets you a small amount of added power savings.
All that assumes the logic will shrink which is not a given.
Added points regarding cooling improvements:
brain power density of 20mW/cc is quite low.
ΔT is pretty small (single digit °C)
switching to temperature tolerant materials for higher ΔT gives (1-1.5 OOM)
phase change cooling gives another 1 OOM
Increasing pump power/coolant volume is the biggie since even a few Mpa is doable without being counterproductive or increasing power budget much (2-3 OOM)
even if cooling is hard binding, if interconnect density increases, can downsize a bit less and devote more volume to cooling.
The brain is already at minimal viable clock rate.
Your comment now seems largely in agreement: reducing wire length 10x cuts interconnect power consumption by 10x but surface area decreases 100x so surface power density increases 10x. That would result in a 3x increase in temp/cooling demands which is completely unviable for a bio brain constrained to room temp and already using active liquid cooling and the entire surface of the skin as a radiator.
Digital computers of course can—and do—go much denser/hotter, but that ends up ultimately costing more energy for cooling.
So anyway the conclusion of that section was:
Conclusion: The brain is perhaps 1 to 2 OOM larger than the physical limits for a computer of equivalent power, but is constrained to its somewhat larger than minimal size due in part to thermodynamic cooling considerations.
What sets the minimal clock rate? Increasing wire resistance and reducing the number of ion channels and pumps proportionally should just work. (ignoring leakage).
It is certainly tempting to run at higher clock speeds (serial thinking speed is a nice feature) but if miniaturization can be done and then clock speeds must be limited for thermal reasons why can’t we just do that?
That aside, is miniaturization out of the question (IE:logic won’t shrink)? Is there a lower limit on number of charge carriers for synapses to work?
Synapses are around 1µm³ which seems big enough to shrink down a bit without weird quantum effects ruining everything. Humans have certainly made smaller transistors or memristors for that matter. Perhaps some of the learning functionality needs to be stripped but we do inference on models all the time without any continuous learning and that’s still quite useful.
Evolutionary arms races: ie the need to think quickly to avoid becoming prey, think fast enough to catch prey, etc.
That aside, is miniaturization out of the question (IE:logic won’t shrink)? Is there a lower limit on number of charge carriers for synapses to work?
The prime overall size constraint seems may be surface/volume ratios and temp as we already discussed, but yes synapses are already pretty minimal for what they do (they are analog multipliers and storage devices).
Synapses are equivalent to entire multipliers + storage devices + some extra functions, far more than transistors.
Yes, you need some kind of switch for any mechanical computer. My point was that you need multiple mechanical “amplifiers” for each single positioner arm, the energy usage of that would be substantial, and if you have a binary mechanical switch controlling a relatively large movement, then the thermal noise will put it in an intermediate state a lot of the time so the arm position will be off.
That’s not how computers (the ones we have today or the rod logic ones proposed work). Each rod or wire represents a single on/off bit.
Yes, doing mechanosynthesis is more complicated and precise sub nm control of a tooltip may not be competitive with biology for self replication. But if the AI wants a substrate to think on that can implement lots of FLOPs then molecular rod logic will work.
For that matter protein based mechanical or hybrid electromechanical computers are plausible. Likely with lower energy consumption per erased bit than neurons and certainly with more density. Human computers have nm sized transistors. There’s no reason to think that neurons and synapses are the most efficient sort of biological computer.
Bio-neuron based brains are extremely efficient, and close to pareto-optimal. We are near the end of moore’s law and the viable open routes for forward progress in energy efficiency are essentially neuromorphic.
edit: continued partially in the original article
That post makes a fundamental error about wiring energy efficiency by ignoring the 8 OOM difference in electrical conductivity between neuron saltwater and copper. (0.5 S vs 50 MS)
There’s almost certainly a factor of 100 energy efficiency gains to be had by switching from saltwater to copper in the brain and reducing capacitance by thinning the wires. I’ll be leaving a comment soon but that had to be said.
energy/bit/(linear distance) agreement points to underlying principle of “if you’ve thinned the wires why haven’t you packed everything in tighter” leading to similar capacitance and therefore energy values/(linear distance)
face to face die stacking results suggest that computers could be much more efficient if they weren’t limited to 2d packing of logic elements. A second logic layer more than halved power consumption at the same performance and that’s with limited interconnect density between the two logic dies.
The Cu<-->saltwater conductivity difference leads to better utilisation of wiring capacitance to reduce thermal noise voltage at transistor gates. Concretely, there are more electrons able to effectively vote on the output voltage. For very short interconnects this matters less but long distance or high fanout nodes have lots of capacitance and low resistance wires make the voltage much more stable.
Electrical conduction through “Neuron saltwater” is not how neuronal interconnect works, its electrochemical. You are simply mistaken, as copper interconnect wire energy limits and neuron wire energy efficiency limits are essentially the same and both approach the theoretical landauer minimum as explained in the article.
Mandatory footnote for this comment:
The Landauer limit puts the energy cost to erase a bit at about 0.02eV at room temperature. For comparison, the energy in a single photon of visible light is about 1eV. Already we can see that the brain is not going to get anywhere close to this. 1eV is a molecular energy scale, not a cellular one.
The brain requires about 20 Watts of power. Running this directly through the Landauer limit, we get 10^21 bits erased per second. For comparison, the number of synapses is about 2*10^14 (pulled from jacob_cannell’s post linked above) and this gives about 600kB of data erased per synapse per second. This is not a reasonable number! It’s justified in the post by assuming that we’re banned from using regular digital logic to implement binary arithmetic and are instead forced into using heaps of “counters” where the size of the heap is the number you’re representing, and this comes along with shot noise, of course.
The section on “interconnect” similarly assumes that we’re forced to dissipate a certain amount of energy per bit transferred per unit length of interconnection. We’re banned from using superconducting interconnect, or any other creative solution here. Also, if we could shrink everything, the required length of interconnect would be shorter, but the post just does the calculation for things being normal brain size.
I’d further argue that, even if interconnect requirements are as a matter of engineering practicality close the the limits of what we can build, we should not confuse that with being “close to the thermodynamic limits”. Moving a bit from here to there should have no thermodynamic cost, and if we can’t manage it except by dissipating a huge amount of energy, then that’s a fact about our engineering skills, not a fact about the amount of computation the brain is doing.
In short, if you assume that you have to do things the way the brain does them, then the brain is somewhat close to “thermodynamic limits”, but without those assumptions it’s nowhere near the actual Landauer limit.
No it does not—that is one of many common layman misunderstandings, which the article corrects. The practical Landauer limit (for fast reliable erasures) is closer to 1eV.
Digital multipliers use similar or more energy for low precision multiply but are far larger, as discussed in the article with numerous links to research literature. (And most upcoming advanced designs for approaching brain energy efficiency use analog multipliers—as in memristor crossbar designs).
That is indeed how conventional computing works.
You obviously didn’t read the post as indeed it discusses this - see the section on size and temperature.
As discussed in the post—you absolutely can move bits without dissipating much energy using reversible interconnect (ie optics), but this does not come without enormous fundamental disadvantages in size.
So this is how the 1eV value is derived, right? Start with a bit that we want to erase. Set things up so there’s an energy gap of ΔE between the 0 state and the 1 state. Then couple to the environment, and wait for some length of time, so the probability that the bit has a value of 0 becomes:
11+e−βΔE
This is the probability of successful erasure, and if we want to get a really high probability, we need to set ΔE=50kT or something like that.
But instead imagine that we’re trying to erase 100 bits all at once. Now we set things up so that the 2100−1 bit strings that aren’t all zeros have an energy of ΔE and the all-zeros bit string has an energy of 0. Now if we couple to the environment, we get the following probability of successful erasure of all the bits:
11+(2100−1)e−βΔE
This is approximately equal to:
11+2100e−βΔE=11+e100log2−βΔE
Now, to make the probability of successful erasure really high, we can pick:
ΔE=50kT+100(kTlog2)
The 100(kTlog2) is there to cancel the 100log2 in the exponent. This is just the familiar Landauer limit. And the 50kT is there to make sure that we get the same level of reliability as before. But now that 50kT is amortized over 100 bits, so the extra reliability cost per bit is much less. So if I’m not wrong, the theoretical limit per bit should still be kTlog2.
The article has links to the 3 good sources (Landauer, Zhirnov, Frank) for this derivation. I don’t have time to analyze your math in detail but I suspect you are starting with the wrong setup—you need a minimal energy well to represent a bit stably against noise at all, and you pay that price for each bit, otherwise it isn’t actually a bit.
My prior that you find an error in the physics lit here is extremely low—this is pretty well established at this point.
I’ve taken a look at Michael P. Frank’s paper and it doesn’t seem like I’ve found an error in the physics lit. Also, I still 100% endorse my comment above: The physics is correct.
So your priors check out, but how can both be true?
To use the terminology in Frank, this is Esig you’re talking about. My analysis above applies to Ediss. Now in section 2 of Frank’s paper, he says:
The formula kTlogr shows up in section 2, before Frank moves on to talking about reversible computing. In section 3, he gives adiabatic switching as an example of a case where Ediss can be made much smaller than Esig. (Though other mechanisms are also possible.) About midway through section 4, Frank uses the standard kTlog2 value, since he’s no longer discussing the restricted case where Ediss=Esig.
Adiabatic computing is a form of partial reversible computing.
If you can only erase bits 100 at a time, you don’t really have 100 bits, do you?
Now your thermal state just equalizes probabilities across those nonzero bit strings.
That point (compute energy/system surface area) assumes we can’t drop clock speed. If cooling was the binding constraint, drop clock speed and now we can reap gains in eficiency from miniaturization.
Heat dissipation scales linearly with size for a constant ΔT. Shrink a device by a factor of ten and the driving thermal gradient increases in steepness by ten while the cross sectional area of the material conducting that heat goes down by 100x. So if thermals are the constraint, then scaling linear dimensions down by 10x requires reducing power by 10x or switching to some exotic cooling solution (which may be limited in improvement OOMs achievable).
But if we assume constant energy per bit*(linear distance), reducing wire length by 10x cuts power consumption by 10x. Only if you want to increase clock speed by 10x (since propagation velocity is unchanged and signal travel less distance). Does power go back up. In fact wire thinning to reduce propagation speed gets you a small amount of added power savings.
All that assumes the logic will shrink which is not a given.
Added points regarding cooling improvements:
brain power density of 20mW/cc is quite low.
ΔT is pretty small (single digit °C)
switching to temperature tolerant materials for higher ΔT gives (1-1.5 OOM)
phase change cooling gives another 1 OOM
Increasing pump power/coolant volume is the biggie since even a few Mpa is doable without being counterproductive or increasing power budget much (2-3 OOM)
even if cooling is hard binding, if interconnect density increases, can downsize a bit less and devote more volume to cooling.
The brain is already at minimal viable clock rate.
Your comment now seems largely in agreement: reducing wire length 10x cuts interconnect power consumption by 10x but surface area decreases 100x so surface power density increases 10x. That would result in a 3x increase in temp/cooling demands which is completely unviable for a bio brain constrained to room temp and already using active liquid cooling and the entire surface of the skin as a radiator.
Digital computers of course can—and do—go much denser/hotter, but that ends up ultimately costing more energy for cooling.
So anyway the conclusion of that section was:
What sets the minimal clock rate? Increasing wire resistance and reducing the number of ion channels and pumps proportionally should just work. (ignoring leakage).
It is certainly tempting to run at higher clock speeds (serial thinking speed is a nice feature) but if miniaturization can be done and then clock speeds must be limited for thermal reasons why can’t we just do that?
That aside, is miniaturization out of the question (IE:logic won’t shrink)? Is there a lower limit on number of charge carriers for synapses to work?
Synapses are around 1µm³ which seems big enough to shrink down a bit without weird quantum effects ruining everything. Humans have certainly made smaller transistors or memristors for that matter. Perhaps some of the learning functionality needs to be stripped but we do inference on models all the time without any continuous learning and that’s still quite useful.
Signal propagation is faster in larger axons.
Evolutionary arms races: ie the need to think quickly to avoid becoming prey, think fast enough to catch prey, etc.
The prime overall size constraint seems may be surface/volume ratios and temp as we already discussed, but yes synapses are already pretty minimal for what they do (they are analog multipliers and storage devices).
Synapses are equivalent to entire multipliers + storage devices + some extra functions, far more than transistors.
you might find this post interesting