OK, maybe you want to build some kind of mechanical computers too. Clearly, life doesn’t require that for operation, but does that even work? Consider a mechanical computer indicating a position. It has some number, and the high bit corresponds to a large positional difference, which means you need a long lever, and then the force is too weak, so you’d need some mechanical amplifier. So that’s a problem.
Drexler absolutely considered thermal noise. Rod logic uses rods at right angles whose positions allow or prevent movement of other rods. That’s the amplification since a small force moving one rod can control a later applied larger force on a blocked rod.
Drexler’s calculations concern the thermal excitation of vibrations in logic rods, not the thermal excitation of their translational motion. Plugging his own numbers for dissipation into the fluctuation-dissipation relation, a typical thermal displacement of a rod during a cycle is going to be on the order of the 0.7nm error threshold for his proposed design in Nanosystems.
That dissipation is already at the limit (from Akhiezer damping) of what defect-free bulk diamond could theoretically achieve at the proposed frequency of operation even if somehow all thermoelastic damping, friction, and acoustic radiation could be engineered away. An assembly of non-bonded rods sliding against and colliding with one another ought to have something like 3 orders of magnitude worse noise and dissipation from fundamental processes alone, irrespective of clever engineering, as a lower bound. Assemblies like this in general, not just the nanomechanical computer, aren’t going to operate with nanometer precision at room temperature.
I’ve given it some thought, yes. Nanosystems proposes something like what you describe. During its motion, the rod is supposed to be confined to its trajectory by the drive mechanism, which, in response to deviations from the desired trajectory, rapidly applies forces much stronger than the net force accelerating the rod.
But the drive mechanism is also vibrating. That’s why I mentioned the fluctuation-dissipation theorem—very informally, it doesn’t matter what the drive mechanism looks like. You can calculate the noise forces based on the dissipation associated with the positional degree of freedom.
There’s a second fundamental problem in positional uncertainty due to backaction from the drive mechanism. Very informally, if you want your confining potential to put your rod inside a range Δx with some response speed (bandwidth), then the fluctuations in the force obey ΔxΔF≥ℏ/2×bandwidth, from standard uncertainty principle arguments. But those fluctuations themselves impart positional noise. Getting the imprecision safely below the error threshold in the presence of thermal noise puts backaction in the range of thermal forces.
Sorry for the previous comment. I misunderstood your original point.
My original understanding was, that the fluctuation-dissipation relation connects lossy dynamic things (EG, electrical resistance, viscous drag) to related thermal noise (Johnson–Nyquist noise, Brownian force). So Drexler has some figure for viscous damping (essentially) of a rod inside a guide channel and this predicts some thermal W/Hz/(meter of rod) spectral noise power density. That was what I thought initially and led to my first comment. If the rods are moving around then just hold them in position, right?
This is true but incomplete.
But the drive mechanism is also vibrating. That’s why I mentioned the fluctuation-dissipation theorem—very informally, it doesn’t matter what the drive mechanism looks like. You can calculate the noise forces based on the dissipation associated with the positional degree of freedom.
You pointed out that a similar phenomenon exists in *whatever* controls linear position. Springs have associated damping coefficients so the damping coefficient in the spring extension DOF has associated thermal noise. In theory this can be zero but some practical minimum exists represented by EG:”defect-free bulk diamond” which gives some minimum practical noise power per unit force.
Concretely, take a block of diamond and apply the max allowable compressive force. This is the lowest dissipation spring that can provide that much force. Real structures will be much worse.
Going back to the rod logic system, if I “drive” the rod by covalently bonding one end to the structure, will it actually move 0.7 nm? (C-C bond length is ~0.15 nm. linear spring model says bond should break at +0.17nm extension (350kJ/mol, 40n/m stiffness)). That *is* a way to control position … so if you’re right, the rod should break the covalent bond. My intuition is thermal energy doesn’t usually do that.
What are the the numbers you’re using?(bandwidth, stiffness, etc.)?
Does your math suggest that in the static case rods will vibrate out of position? Maybe I’m misunderstanding things.
During its motion, the rod is supposed to be confined to its trajectory by the drive mechanism, which, in response to deviations from the desired trajectory, rapidly applies forces much stronger than the net force accelerating the rod.
(Nanosystems PP344 (fig 12.2)
Having the text in front of me now, the rods supposedly have “alignment knobs” which limit range of motion. The drive springs don’t have to define rod position to within the error threshold during motion.
The knob<-->channel contact could be much more rigid than the spring, depending on interatomic repulsion. That’s a lot closer to the “covalently bond the rod to the structure” hypothetical suggested above. If the dissipation-fluctuation based argument holds, the opposing force and stiffness will be on the order of bond stiffness/strength.
There’s a second fundamental problem in positional uncertainty due to backaction from the drive mechanism. Very informally, if you want your confining potential to put your rod inside a range Δx with some response speed (bandwidth), then the fluctuations in the force obey ΔxΔF≥ℏ/2×bandwidth, from standard uncertainty principle arguments. But those fluctuations themselves impart positional noise. Getting the imprecision safely below the error threshold in the presence of thermal noise puts backaction in the range of thermal forces.
When I plug the hypothetical numbers into that equation (10Ghz, 0.7nm) I get force deviations in the fN range (1.5e-15 N) that’s six orders of magnitude from the nanonewton range forces proposed for actuation. This should Accommodate using the pessimistic “characteristic frequency of rod vibration”(10Thz) along with some narrowing of positional uncertainty.
That aside, these are atoms. De Broglie wavelength for a single carbon atom at room temp is 0.04 nm and we’re dealing with many carbon atoms bonded together. Quantum mechanical effects are still significant?
If you’re right, and if the numbers are conservative with real damping coefficients 3 OOM higher, forces would be 1.5 OOM higher meaning covalent bonds hold things together much less well. This seems wrong. Benzyl groups would seem then to regularly fall off of rigid molecules for example. Perhaps the rods are especially rigid leading to better coupling of thermal noise into the anchoring bond at lower atom counts?
Certainly if drexler’s design is impossible by 3 orders of magnitude rod logic would perform much less well.
No worries, my comment didn’t give much to go on. I did say “a typical thermal displacement of a rod during a cycle is going to be on the order of the 0.7nm error threshold for his proposed design”, which isn’t true if the mechanism works as described. It might have been better to frame it as—you’re in a bad situation when your thermal kinetic energy is on the order of the kinetic energy of the switching motion. There’s no clean win to be had.
If the positional uncertainty was close to the error limit, can we just bump up the logic element size(2x, 3x, 10x)? I’d assume scaling things up by some factor would reduce the relative effects of thermal noise and uncertainty.
That’s correct, although it increases power requirements and introduces low-frequency resonances to the logic elements.
Also, the expression ( ΔxΔF≥ℏ/2×bandwidth ) suggests the second concern might be clock rate?
In this design, the bandwidth requirement is set by how quickly a blocked rod will pass if the blocker fluctuates out of the way. If slowing the clock rate 10x includes reducing all forces by a factor of 100 to slow everything down proportionally, then yes, this lets you average away backaction noise like √10 while permitting more thermal motion. If you keep making everything both larger and slower, it will eventually work, yes. Will it be competitive with field-effect transistors? Practically, I doubt it, but it’s harder to find in-principle arguments at that level.
That noted, in this design, (I think) a blocked rod is tensioned with ~10x the switching drive force, so you’d want the response time of the restoring force to be ~10 ps. If your Δx is the same as the error threshold, then you’re admitting error rates of 10−1. Using (100 GHz, 0.07 nm [Drexler seems to claim 0.02nm in 12.3.7b]), the quantum-limited force noise spectral density is a few times less than the thermal force noise related to the claimed drag on the 1GHz cycle.
What I’m saying isn’t that the numbers in Nanosystems don’t keep the rod in place. These noise forces are connected with displacement noise by the stiffness of the mechanism, as you observe. What I’m saying is that these numbers are so close to quantum limits that they can’t be right, or even within a couple of orders of magnitude of right. As you say, quantum effects shouldn’t be relevant. By the same token, noise and dissipation should be far above quantum limits.
Yeah, transistor based designs also look promising. Insulation on the order of 2-3 nm suffices to prevent tunneling leakage and speeds are faster. Promises of quasi-reversibility, low power and the absurdly low element size made rod logic appealing if feasible. I’ll settle for clock speeds a factor of 100 higher even if you can’t fit a microcontroller in a microbe.
My instinct is to look for low hanging design optimizations to salvage performance (EG: drive system changes to make forces on rods at end of travel and blocked rods equal reducing speed of errors and removing most of that 10x penalty.) Maybe enough of those can cut the required scale-up to the point where it’s competitive in some areas with transistors.
But we won’t know any of this for sure unless it’s built. If thermal noise is 3OOM worse than Drexler’s figures it’s all pointless anyways.
I remain skeptical the system will move significant fractions of a bond length if a rod is held by a potential well formed by inter-atomic repulsion on one of the “alignment knobs” and mostly constant drive spring force. Stiffness and max force should be perhaps half that of a C-C bond and energy required to move the rod out of position would be 2-3x that to break a C-C bond since the spring can keep applying force over the error threshold distance. Alternatively the system *is* that aggressively built such that thermal noise is enough to break things in normal operation which is a big point against.
I’m not sure how to evaluate this, so I made a Manifold market for it. I’d be excited for you to help me edit the market if you endorse slightly different wording.
Yes, you need some kind of switch for any mechanical computer. My point was that you need multiple mechanical “amplifiers” for each single positioner arm, the energy usage of that would be substantial, and if you have a binary mechanical switch controlling a relatively large movement, then the thermal noise will put it in an intermediate state a lot of the time so the arm position will be off.
That’s not how computers (the ones we have today or the rod logic ones proposed work). Each rod or wire represents a single on/off bit.
Yes, doing mechanosynthesis is more complicated and precise sub nm control of a tooltip may not be competitive with biology for self replication. But if the AI wants a substrate to think on that can implement lots of FLOPs then molecular rod logic will work.
For that matter protein based mechanical or hybrid electromechanical computers are plausible. Likely with lower energy consumption per erased bit than neurons and certainly with more density. Human computers have nm sized transistors. There’s no reason to think that neurons and synapses are the most efficient sort of biological computer.
There’s no reason to think that neurons and synapses are the most efficient sort of biological computer.
Bio-neuron based brains are extremely efficient, and close to pareto-optimal. We are near the end of moore’s law and the viable open routes for forward progress in energy efficiency are essentially neuromorphic.
That post makes a fundamental error about wiring energy efficiency by ignoring the 8 OOM difference in electrical conductivity between neuron saltwater and copper. (0.5 S vs 50 MS)
There’s almost certainly a factor of 100 energy efficiency gains to be had by switching from saltwater to copper in the brain and reducing capacitance by thinning the wires. I’ll be leaving a comment soon but that had to be said.
energy/bit/(linear distance) agreement points to underlying principle of “if you’ve thinned the wires why haven’t you packed everything in tighter” leading to similar capacitance and therefore energy values/(linear distance)
face to face die stacking results suggest that computers could be much more efficient if they weren’t limited to 2d packing of logic elements. A second logic layer more than halved power consumption at the same performance and that’s with limited interconnect density between the two logic dies.
The Cu<-->saltwater conductivity difference leads to better utilisation of wiring capacitance to reduce thermal noise voltage at transistor gates. Concretely, there are more electrons able to effectively vote on the output voltage. For very short interconnects this matters less but long distance or high fanout nodes have lots of capacitance and low resistance wires make the voltage much more stable.
Electrical conduction through “Neuron saltwater” is not how neuronal interconnect works, its electrochemical. You are simply mistaken, as copper interconnect wire energy limits and neuron wire energy efficiency limits are essentially the same and both approach the theoretical landauer minimum as explained in the article.
The Landauer limit puts the energy cost to erase a bit at about 0.02eV at room temperature. For comparison, the energy in a single photon of visible light is about 1eV. Already we can see that the brain is not going to get anywhere close to this. 1eV is a molecular energy scale, not a cellular one.
The brain requires about 20 Watts of power. Running this directly through the Landauer limit, we get 10^21 bits erased per second. For comparison, the number of synapses is about 2*10^14 (pulled from jacob_cannell’s post linked above) and this gives about 600kB of data erased per synapse per second. This is not a reasonable number! It’s justified in the post by assuming that we’re banned from using regular digital logic to implement binary arithmetic and are instead forced into using heaps of “counters” where the size of the heap is the number you’re representing, and this comes along with shot noise, of course.
The section on “interconnect” similarly assumes that we’re forced to dissipate a certain amount of energy per bit transferred per unit length of interconnection. We’re banned from using superconducting interconnect, or any other creative solution here. Also, if we could shrink everything, the required length of interconnect would be shorter, but the post just does the calculation for things being normal brain size.
I’d further argue that, even if interconnect requirements are as a matter of engineering practicality close the the limits of what we can build, we should not confuse that with being “close to the thermodynamic limits”. Moving a bit from here to there should have no thermodynamic cost, and if we can’t manage it except by dissipating a huge amount of energy, then that’s a fact about our engineering skills, not a fact about the amount of computation the brain is doing.
In short, if you assume that you have to do things the way the brain does them, then the brain is somewhat close to “thermodynamic limits”, but without those assumptions it’s nowhere near the actual Landauer limit.
The Landauer limit puts the energy cost to erase a bit at about 0.02eV at room temperature.
No it does not—that is one of many common layman misunderstandings, which the article corrects. The practical Landauer limit (for fast reliable erasures) is closer to 1eV.
It’s justified in the post by assuming that we’re banned from using regular digital logic to implement binary arithmetic and are instead forced into using heaps of “counters”
Digital multipliers use similar or more energy for low precision multiply but are far larger, as discussed in the article with numerous links to research literature. (And most upcoming advanced designs for approaching brain energy efficiency use analog multipliers—as in memristor crossbar designs).
The section on “interconnect” similarly assumes that we’re forced to dissipate a certain amount of energy per bit transferred per unit length of interconnection.
That is indeed how conventional computing works.
Also, if we could shrink everything, the required length of interconnect would be shorter, but the post just does the calculation for things being normal brain size.
You obviously didn’t read the post as indeed it discusses this - see the section on size and temperature.
Moving a bit from here to there should have no thermodynamic cost, and if we can’t manage it except by dissipating a huge amount of energy, then that’s a fact about our engineering skills,
As discussed in the post—you absolutely can move bits without dissipating much energy using reversible interconnect (ie optics), but this does not come without enormous fundamental disadvantages in size.
No it does not—that is one of many common layman misunderstandings, which the article corrects. > The practical Landauer limit (for fast reliable erasures) is closer to 1eV.
So this is how the 1eV value is derived, right? Start with a bit that we want to erase. Set things up so there’s an energy gap of ΔE between the 0 state and the 1 state. Then couple to the environment, and wait for some length of time, so the probability that the bit has a value of 0 becomes:
11+e−βΔE
This is the probability of successful erasure, and if we want to get a really high probability, we need to set ΔE=50kT or something like that.
But instead imagine that we’re trying to erase 100 bits all at once. Now we set things up so that the 2100−1 bit strings that aren’t all zeros have an energy of ΔE and the all-zeros bit string has an energy of 0. Now if we couple to the environment, we get the following probability of successful erasure of all the bits:
11+(2100−1)e−βΔE
This is approximately equal to:
11+2100e−βΔE=11+e100log2−βΔE
Now, to make the probability of successful erasure really high, we can pick:
ΔE=50kT+100(kTlog2)
The 100(kTlog2) is there to cancel the 100log2 in the exponent. This is just the familiar Landauer limit. And the 50kT is there to make sure that we get the same level of reliability as before. But now that 50kT is amortized over 100 bits, so the extra reliability cost per bit is much less. So if I’m not wrong, the theoretical limit per bit should still be kTlog2.
The article has links to the 3 good sources (Landauer, Zhirnov, Frank) for this derivation. I don’t have time to analyze your math in detail but I suspect you are starting with the wrong setup—you need a minimal energy well to represent a bit stably against noise at all, and you pay that price for each bit, otherwise it isn’t actually a bit.
My prior that you find an error in the physics lit here is extremely low—this is pretty well established at this point.
I’ve taken a look at Michael P. Frank’s paper and it doesn’t seem like I’ve found an error in the physics lit. Also, I still 100% endorse my comment above: The physics is correct.
So your priors check out, but how can both be true?
you need a minimal energy well to represent a bit stably against noise at all, and you pay that price for each bit, otherwise it isn’t actually a bit.
To use the terminology in Frank, this is Esig you’re talking about. My analysis above applies to Ediss. Now in section 2 of Frank’s paper, he says:
With this particular mechanism, we see that Ediss=Esig; later, we will see that
in other mechanisms, Ediss can be made much less than Esig.
The formula kTlogr shows up in section 2, before Frank moves on to talking about reversible computing. In section 3, he gives adiabatic switching as an example of a case where Ediss can be made much smaller than Esig. (Though other mechanisms are also possible.) About midway through section 4, Frank uses the standard kTlog2 value, since he’s no longer discussing the restricted case where Ediss=Esig.
You obviously didn’t read the post as indeed it discusses this—see the section on size and temperature.
That point (compute energy/system surface area) assumes we can’t drop clock speed. If cooling was the binding constraint, drop clock speed and now we can reap gains in eficiency from miniaturization.
Heat dissipation scales linearly with size for a constant ΔT. Shrink a device by a factor of ten and the driving thermal gradient increases in steepness by ten while the cross sectional area of the material conducting that heat goes down by 100x. So if thermals are the constraint, then scaling linear dimensions down by 10x requires reducing power by 10x or switching to some exotic cooling solution (which may be limited in improvement OOMs achievable).
But if we assume constant energy per bit*(linear distance), reducing wire length by 10x cuts power consumption by 10x. Only if you want to increase clock speed by 10x (since propagation velocity is unchanged and signal travel less distance). Does power go back up. In fact wire thinning to reduce propagation speed gets you a small amount of added power savings.
All that assumes the logic will shrink which is not a given.
Added points regarding cooling improvements:
brain power density of 20mW/cc is quite low.
ΔT is pretty small (single digit °C)
switching to temperature tolerant materials for higher ΔT gives (1-1.5 OOM)
phase change cooling gives another 1 OOM
Increasing pump power/coolant volume is the biggie since even a few Mpa is doable without being counterproductive or increasing power budget much (2-3 OOM)
even if cooling is hard binding, if interconnect density increases, can downsize a bit less and devote more volume to cooling.
The brain is already at minimal viable clock rate.
Your comment now seems largely in agreement: reducing wire length 10x cuts interconnect power consumption by 10x but surface area decreases 100x so surface power density increases 10x. That would result in a 3x increase in temp/cooling demands which is completely unviable for a bio brain constrained to room temp and already using active liquid cooling and the entire surface of the skin as a radiator.
Digital computers of course can—and do—go much denser/hotter, but that ends up ultimately costing more energy for cooling.
So anyway the conclusion of that section was:
Conclusion: The brain is perhaps 1 to 2 OOM larger than the physical limits for a computer of equivalent power, but is constrained to its somewhat larger than minimal size due in part to thermodynamic cooling considerations.
What sets the minimal clock rate? Increasing wire resistance and reducing the number of ion channels and pumps proportionally should just work. (ignoring leakage).
It is certainly tempting to run at higher clock speeds (serial thinking speed is a nice feature) but if miniaturization can be done and then clock speeds must be limited for thermal reasons why can’t we just do that?
That aside, is miniaturization out of the question (IE:logic won’t shrink)? Is there a lower limit on number of charge carriers for synapses to work?
Synapses are around 1µm³ which seems big enough to shrink down a bit without weird quantum effects ruining everything. Humans have certainly made smaller transistors or memristors for that matter. Perhaps some of the learning functionality needs to be stripped but we do inference on models all the time without any continuous learning and that’s still quite useful.
Evolutionary arms races: ie the need to think quickly to avoid becoming prey, think fast enough to catch prey, etc.
That aside, is miniaturization out of the question (IE:logic won’t shrink)? Is there a lower limit on number of charge carriers for synapses to work?
The prime overall size constraint seems may be surface/volume ratios and temp as we already discussed, but yes synapses are already pretty minimal for what they do (they are analog multipliers and storage devices).
Synapses are equivalent to entire multipliers + storage devices + some extra functions, far more than transistors.
Drexler absolutely considered thermal noise. Rod logic uses rods at right angles whose positions allow or prevent movement of other rods. That’s the amplification since a small force moving one rod can control a later applied larger force on a blocked rod.
http://www.nanoindustries.com/nanojbl/NanoConProc/nanocon2.html#anchor84400
Drexler’s calculations concern the thermal excitation of vibrations in logic rods, not the thermal excitation of their translational motion. Plugging his own numbers for dissipation into the fluctuation-dissipation relation, a typical thermal displacement of a rod during a cycle is going to be on the order of the 0.7nm error threshold for his proposed design in Nanosystems.
That dissipation is already at the limit (from Akhiezer damping) of what defect-free bulk diamond could theoretically achieve at the proposed frequency of operation even if somehow all thermoelastic damping, friction, and acoustic radiation could be engineered away. An assembly of non-bonded rods sliding against and colliding with one another ought to have something like 3 orders of magnitude worse noise and dissipation from fundamental processes alone, irrespective of clever engineering, as a lower bound. Assemblies like this in general, not just the nanomechanical computer, aren’t going to operate with nanometer precision at room temperature.
edit: This was uncharitable. Sorry about that.
This comment suggested not leaving rods to flop around if they were vibrating.
The real concern was that positive control of the rods to the needed precision was impossible as described below.
I’ve given it some thought, yes. Nanosystems proposes something like what you describe. During its motion, the rod is supposed to be confined to its trajectory by the drive mechanism, which, in response to deviations from the desired trajectory, rapidly applies forces much stronger than the net force accelerating the rod.
But the drive mechanism is also vibrating. That’s why I mentioned the fluctuation-dissipation theorem—very informally, it doesn’t matter what the drive mechanism looks like. You can calculate the noise forces based on the dissipation associated with the positional degree of freedom.
There’s a second fundamental problem in positional uncertainty due to backaction from the drive mechanism. Very informally, if you want your confining potential to put your rod inside a range Δx with some response speed (bandwidth), then the fluctuations in the force obey ΔxΔF≥ℏ/2×bandwidth, from standard uncertainty principle arguments. But those fluctuations themselves impart positional noise. Getting the imprecision safely below the error threshold in the presence of thermal noise puts backaction in the range of thermal forces.
Sorry for the previous comment. I misunderstood your original point.
My original understanding was, that the fluctuation-dissipation relation connects lossy dynamic things (EG, electrical resistance, viscous drag) to related thermal noise (Johnson–Nyquist noise, Brownian force). So Drexler has some figure for viscous damping (essentially) of a rod inside a guide channel and this predicts some thermal W/Hz/(meter of rod) spectral noise power density. That was what I thought initially and led to my first comment. If the rods are moving around then just hold them in position, right?
This is true but incomplete.
You pointed out that a similar phenomenon exists in *whatever* controls linear position. Springs have associated damping coefficients so the damping coefficient in the spring extension DOF has associated thermal noise. In theory this can be zero but some practical minimum exists represented by EG:”defect-free bulk diamond” which gives some minimum practical noise power per unit force.
Concretely, take a block of diamond and apply the max allowable compressive force. This is the lowest dissipation spring that can provide that much force. Real structures will be much worse.
Going back to the rod logic system, if I “drive” the rod by covalently bonding one end to the structure, will it actually move 0.7 nm? (C-C bond length is ~0.15 nm. linear spring model says bond should break at +0.17nm extension (350kJ/mol, 40n/m stiffness)). That *is* a way to control position … so if you’re right, the rod should break the covalent bond. My intuition is thermal energy doesn’t usually do that.
What are the the numbers you’re using?(bandwidth, stiffness, etc.)?
Does your math suggest that in the static case rods will vibrate out of position? Maybe I’m misunderstanding things.
(Nanosystems PP344 (fig 12.2)Having the text in front of me now, the rods supposedly have “alignment knobs” which limit range of motion. The drive springs don’t have to define rod position to within the error threshold during motion.
The knob<-->channel contact could be much more rigid than the spring, depending on interatomic repulsion. That’s a lot closer to the “covalently bond the rod to the structure” hypothetical suggested above. If the dissipation-fluctuation based argument holds, the opposing force and stiffness will be on the order of bond stiffness/strength.
When I plug the hypothetical numbers into that equation (10Ghz, 0.7nm) I get force deviations in the fN range (1.5e-15 N) that’s six orders of magnitude from the nanonewton range forces proposed for actuation. This should Accommodate using the pessimistic “characteristic frequency of rod vibration”(10Thz) along with some narrowing of positional uncertainty.
That aside, these are atoms. De Broglie wavelength for a single carbon atom at room temp is 0.04 nm and we’re dealing with many carbon atoms bonded together. Quantum mechanical effects are still significant?
If you’re right, and if the numbers are conservative with real damping coefficients 3 OOM higher, forces would be 1.5 OOM higher meaning covalent bonds hold things together much less well. This seems wrong. Benzyl groups would seem then to regularly fall off of rigid molecules for example. Perhaps the rods are especially rigid leading to better coupling of thermal noise into the anchoring bond at lower atom counts?
Certainly if drexler’s design is impossible by 3 orders of magnitude rod logic would perform much less well.
No worries, my comment didn’t give much to go on. I did say “a typical thermal displacement of a rod during a cycle is going to be on the order of the 0.7nm error threshold for his proposed design”, which isn’t true if the mechanism works as described. It might have been better to frame it as—you’re in a bad situation when your thermal kinetic energy is on the order of the kinetic energy of the switching motion. There’s no clean win to be had.
That’s correct, although it increases power requirements and introduces low-frequency resonances to the logic elements.
In this design, the bandwidth requirement is set by how quickly a blocked rod will pass if the blocker fluctuates out of the way. If slowing the clock rate 10x includes reducing all forces by a factor of 100 to slow everything down proportionally, then yes, this lets you average away backaction noise like √10 while permitting more thermal motion. If you keep making everything both larger and slower, it will eventually work, yes. Will it be competitive with field-effect transistors? Practically, I doubt it, but it’s harder to find in-principle arguments at that level.
That noted, in this design, (I think) a blocked rod is tensioned with ~10x the switching drive force, so you’d want the response time of the restoring force to be ~10 ps. If your Δx is the same as the error threshold, then you’re admitting error rates of 10−1. Using (100 GHz, 0.07 nm [Drexler seems to claim 0.02nm in 12.3.7b]), the quantum-limited force noise spectral density is a few times less than the thermal force noise related to the claimed drag on the 1GHz cycle.
What I’m saying isn’t that the numbers in Nanosystems don’t keep the rod in place. These noise forces are connected with displacement noise by the stiffness of the mechanism, as you observe. What I’m saying is that these numbers are so close to quantum limits that they can’t be right, or even within a couple of orders of magnitude of right. As you say, quantum effects shouldn’t be relevant. By the same token, noise and dissipation should be far above quantum limits.
Yeah, transistor based designs also look promising. Insulation on the order of 2-3 nm suffices to prevent tunneling leakage and speeds are faster. Promises of quasi-reversibility, low power and the absurdly low element size made rod logic appealing if feasible. I’ll settle for clock speeds a factor of 100 higher even if you can’t fit a microcontroller in a microbe.
My instinct is to look for low hanging design optimizations to salvage performance (EG: drive system changes to make forces on rods at end of travel and blocked rods equal reducing speed of errors and removing most of that 10x penalty.) Maybe enough of those can cut the required scale-up to the point where it’s competitive in some areas with transistors.
But we won’t know any of this for sure unless it’s built. If thermal noise is 3OOM worse than Drexler’s figures it’s all pointless anyways.
I remain skeptical the system will move significant fractions of a bond length if a rod is held by a potential well formed by inter-atomic repulsion on one of the “alignment knobs” and mostly constant drive spring force. Stiffness and max force should be perhaps half that of a C-C bond and energy required to move the rod out of position would be 2-3x that to break a C-C bond since the spring can keep applying force over the error threshold distance. Alternatively the system *is* that aggressively built such that thermal noise is enough to break things in normal operation which is a big point against.
Just to follow up, I spell out an argument for a lower bound on dissipation that’s 2-3 OOM higher in Appendix C here.
I’m not sure how to evaluate this, so I made a Manifold market for it. I’d be excited for you to help me edit the market if you endorse slightly different wording.
https://manifold.markets/ThomasKwa/does-thermal-noise-make-drexlerian
Yes, you need some kind of switch for any mechanical computer. My point was that you need multiple mechanical “amplifiers” for each single positioner arm, the energy usage of that would be substantial, and if you have a binary mechanical switch controlling a relatively large movement, then the thermal noise will put it in an intermediate state a lot of the time so the arm position will be off.
That’s not how computers (the ones we have today or the rod logic ones proposed work). Each rod or wire represents a single on/off bit.
Yes, doing mechanosynthesis is more complicated and precise sub nm control of a tooltip may not be competitive with biology for self replication. But if the AI wants a substrate to think on that can implement lots of FLOPs then molecular rod logic will work.
For that matter protein based mechanical or hybrid electromechanical computers are plausible. Likely with lower energy consumption per erased bit than neurons and certainly with more density. Human computers have nm sized transistors. There’s no reason to think that neurons and synapses are the most efficient sort of biological computer.
Bio-neuron based brains are extremely efficient, and close to pareto-optimal. We are near the end of moore’s law and the viable open routes for forward progress in energy efficiency are essentially neuromorphic.
edit: continued partially in the original article
That post makes a fundamental error about wiring energy efficiency by ignoring the 8 OOM difference in electrical conductivity between neuron saltwater and copper. (0.5 S vs 50 MS)
There’s almost certainly a factor of 100 energy efficiency gains to be had by switching from saltwater to copper in the brain and reducing capacitance by thinning the wires. I’ll be leaving a comment soon but that had to be said.
energy/bit/(linear distance) agreement points to underlying principle of “if you’ve thinned the wires why haven’t you packed everything in tighter” leading to similar capacitance and therefore energy values/(linear distance)
face to face die stacking results suggest that computers could be much more efficient if they weren’t limited to 2d packing of logic elements. A second logic layer more than halved power consumption at the same performance and that’s with limited interconnect density between the two logic dies.
The Cu<-->saltwater conductivity difference leads to better utilisation of wiring capacitance to reduce thermal noise voltage at transistor gates. Concretely, there are more electrons able to effectively vote on the output voltage. For very short interconnects this matters less but long distance or high fanout nodes have lots of capacitance and low resistance wires make the voltage much more stable.
Electrical conduction through “Neuron saltwater” is not how neuronal interconnect works, its electrochemical. You are simply mistaken, as copper interconnect wire energy limits and neuron wire energy efficiency limits are essentially the same and both approach the theoretical landauer minimum as explained in the article.
Mandatory footnote for this comment:
The Landauer limit puts the energy cost to erase a bit at about 0.02eV at room temperature. For comparison, the energy in a single photon of visible light is about 1eV. Already we can see that the brain is not going to get anywhere close to this. 1eV is a molecular energy scale, not a cellular one.
The brain requires about 20 Watts of power. Running this directly through the Landauer limit, we get 10^21 bits erased per second. For comparison, the number of synapses is about 2*10^14 (pulled from jacob_cannell’s post linked above) and this gives about 600kB of data erased per synapse per second. This is not a reasonable number! It’s justified in the post by assuming that we’re banned from using regular digital logic to implement binary arithmetic and are instead forced into using heaps of “counters” where the size of the heap is the number you’re representing, and this comes along with shot noise, of course.
The section on “interconnect” similarly assumes that we’re forced to dissipate a certain amount of energy per bit transferred per unit length of interconnection. We’re banned from using superconducting interconnect, or any other creative solution here. Also, if we could shrink everything, the required length of interconnect would be shorter, but the post just does the calculation for things being normal brain size.
I’d further argue that, even if interconnect requirements are as a matter of engineering practicality close the the limits of what we can build, we should not confuse that with being “close to the thermodynamic limits”. Moving a bit from here to there should have no thermodynamic cost, and if we can’t manage it except by dissipating a huge amount of energy, then that’s a fact about our engineering skills, not a fact about the amount of computation the brain is doing.
In short, if you assume that you have to do things the way the brain does them, then the brain is somewhat close to “thermodynamic limits”, but without those assumptions it’s nowhere near the actual Landauer limit.
No it does not—that is one of many common layman misunderstandings, which the article corrects. The practical Landauer limit (for fast reliable erasures) is closer to 1eV.
Digital multipliers use similar or more energy for low precision multiply but are far larger, as discussed in the article with numerous links to research literature. (And most upcoming advanced designs for approaching brain energy efficiency use analog multipliers—as in memristor crossbar designs).
That is indeed how conventional computing works.
You obviously didn’t read the post as indeed it discusses this - see the section on size and temperature.
As discussed in the post—you absolutely can move bits without dissipating much energy using reversible interconnect (ie optics), but this does not come without enormous fundamental disadvantages in size.
So this is how the 1eV value is derived, right? Start with a bit that we want to erase. Set things up so there’s an energy gap of ΔE between the 0 state and the 1 state. Then couple to the environment, and wait for some length of time, so the probability that the bit has a value of 0 becomes:
11+e−βΔE
This is the probability of successful erasure, and if we want to get a really high probability, we need to set ΔE=50kT or something like that.
But instead imagine that we’re trying to erase 100 bits all at once. Now we set things up so that the 2100−1 bit strings that aren’t all zeros have an energy of ΔE and the all-zeros bit string has an energy of 0. Now if we couple to the environment, we get the following probability of successful erasure of all the bits:
11+(2100−1)e−βΔE
This is approximately equal to:
11+2100e−βΔE=11+e100log2−βΔE
Now, to make the probability of successful erasure really high, we can pick:
ΔE=50kT+100(kTlog2)
The 100(kTlog2) is there to cancel the 100log2 in the exponent. This is just the familiar Landauer limit. And the 50kT is there to make sure that we get the same level of reliability as before. But now that 50kT is amortized over 100 bits, so the extra reliability cost per bit is much less. So if I’m not wrong, the theoretical limit per bit should still be kTlog2.
The article has links to the 3 good sources (Landauer, Zhirnov, Frank) for this derivation. I don’t have time to analyze your math in detail but I suspect you are starting with the wrong setup—you need a minimal energy well to represent a bit stably against noise at all, and you pay that price for each bit, otherwise it isn’t actually a bit.
My prior that you find an error in the physics lit here is extremely low—this is pretty well established at this point.
I’ve taken a look at Michael P. Frank’s paper and it doesn’t seem like I’ve found an error in the physics lit. Also, I still 100% endorse my comment above: The physics is correct.
So your priors check out, but how can both be true?
To use the terminology in Frank, this is Esig you’re talking about. My analysis above applies to Ediss. Now in section 2 of Frank’s paper, he says:
The formula kTlogr shows up in section 2, before Frank moves on to talking about reversible computing. In section 3, he gives adiabatic switching as an example of a case where Ediss can be made much smaller than Esig. (Though other mechanisms are also possible.) About midway through section 4, Frank uses the standard kTlog2 value, since he’s no longer discussing the restricted case where Ediss=Esig.
Adiabatic computing is a form of partial reversible computing.
If you can only erase bits 100 at a time, you don’t really have 100 bits, do you?
Now your thermal state just equalizes probabilities across those nonzero bit strings.
That point (compute energy/system surface area) assumes we can’t drop clock speed. If cooling was the binding constraint, drop clock speed and now we can reap gains in eficiency from miniaturization.
Heat dissipation scales linearly with size for a constant ΔT. Shrink a device by a factor of ten and the driving thermal gradient increases in steepness by ten while the cross sectional area of the material conducting that heat goes down by 100x. So if thermals are the constraint, then scaling linear dimensions down by 10x requires reducing power by 10x or switching to some exotic cooling solution (which may be limited in improvement OOMs achievable).
But if we assume constant energy per bit*(linear distance), reducing wire length by 10x cuts power consumption by 10x. Only if you want to increase clock speed by 10x (since propagation velocity is unchanged and signal travel less distance). Does power go back up. In fact wire thinning to reduce propagation speed gets you a small amount of added power savings.
All that assumes the logic will shrink which is not a given.
Added points regarding cooling improvements:
brain power density of 20mW/cc is quite low.
ΔT is pretty small (single digit °C)
switching to temperature tolerant materials for higher ΔT gives (1-1.5 OOM)
phase change cooling gives another 1 OOM
Increasing pump power/coolant volume is the biggie since even a few Mpa is doable without being counterproductive or increasing power budget much (2-3 OOM)
even if cooling is hard binding, if interconnect density increases, can downsize a bit less and devote more volume to cooling.
The brain is already at minimal viable clock rate.
Your comment now seems largely in agreement: reducing wire length 10x cuts interconnect power consumption by 10x but surface area decreases 100x so surface power density increases 10x. That would result in a 3x increase in temp/cooling demands which is completely unviable for a bio brain constrained to room temp and already using active liquid cooling and the entire surface of the skin as a radiator.
Digital computers of course can—and do—go much denser/hotter, but that ends up ultimately costing more energy for cooling.
So anyway the conclusion of that section was:
What sets the minimal clock rate? Increasing wire resistance and reducing the number of ion channels and pumps proportionally should just work. (ignoring leakage).
It is certainly tempting to run at higher clock speeds (serial thinking speed is a nice feature) but if miniaturization can be done and then clock speeds must be limited for thermal reasons why can’t we just do that?
That aside, is miniaturization out of the question (IE:logic won’t shrink)? Is there a lower limit on number of charge carriers for synapses to work?
Synapses are around 1µm³ which seems big enough to shrink down a bit without weird quantum effects ruining everything. Humans have certainly made smaller transistors or memristors for that matter. Perhaps some of the learning functionality needs to be stripped but we do inference on models all the time without any continuous learning and that’s still quite useful.
Signal propagation is faster in larger axons.
Evolutionary arms races: ie the need to think quickly to avoid becoming prey, think fast enough to catch prey, etc.
The prime overall size constraint seems may be surface/volume ratios and temp as we already discussed, but yes synapses are already pretty minimal for what they do (they are analog multipliers and storage devices).
Synapses are equivalent to entire multipliers + storage devices + some extra functions, far more than transistors.
you might find this post interesting