Thanks, that’s clarifying. (And yes, I’m well aware that x → B*x is almost never injective, which is why I said it wouldn’t cause 8 bits of erasure rather than the stronger, incorrect claim of 0 bits of erasure.)
To store 1 bit of information you need a potential energy barrier that’s at least as high as k_B T log(2), so you need to switch ~ 8 such barriers, which means in any kind of realistic device you’ll lose ~ 8 k_B T log(2) of electrical potential energy to heat, either through resistance or through radiation. It doesn’t have to be like this, and some idealized device could do better, but GPUs are not idealized devices and neither are brains.
Two more points of confusion:
Why does switching barriers imply that electrical potential energy is probably being converted to heat? I don’t see how that follows at all.
To what extent do information storage requirements weigh on FLOPS requirements? It’s not obvious to me that requirements on energy barriers for long-term storage in thermodynamic equilibrium necessarily bear on transient representations of information in the midst of computations, either because the system is out of thermodynamic equilibrium or because storage times are very short
Why does switching barriers imply that electrical potential energy is probably being converted to heat? I don’t see how that follows at all.
Where else is the energy going to go? Again, in an adiabatic device where you have a lot of time to discharge capacitors and such, you might be able to do everything in a way that conserves free energy. I just don’t see how that’s going to work when you’re (for example) switching transistors on and off at a high frequency. It seems to me that the only place to get rid of the electrical potential energy that quickly is to convert it into heat or radiation.
I think what I’m saying is standard in how people analyze power costs of switching in transistors, see e.g. this physics.se post. If you have a proposal for how you think the brain could actually be working to be much more energy efficient than this, I would like to see some details of it, because I’ve certainly not come across anything like that before.
To what extent do information storage requirements weigh on FLOPS requirements? It’s not obvious to me that requirements on energy barriers for long-term storage in thermodynamic equilibrium necessarily bear on transient representations of information in the midst of computations, either because the system is out of thermodynamic equilibrium or because storage times are very short
The Boltzmann factor roughly gives you the steady-state distribution of the associated two-state Markov chain, so if time delays are short it’s possible this would be irrelevant. However, I think that in realistic devices the Markov chain reaches equilibrium far too quickly for you to get around the thermodynamic argument because the system is out of equilibrium.
My reasoning here is that the Boltzmann factor also gives you the odds of an electron having enough kinetic energy to cross the potential barrier upon colliding with it, so e.g. if you imagine an electron stuck in a potential well that’s O(k_B T) deep, the electron will only need to collide with one of the barriers O(1) times to escape. So the rate of convergence to equilibrium comes down to the length of the well divided by the thermal speed of the electron, which is going to be quite rapid as electrons at the Fermi level in a typical wire move at speeds comparable to 1000 km/s.
I can try to calculate exactly what you should expect the convergence time here to be for some configuration you have in mind, but I’m reasonably confident when the energies involved are comparable to the Landauer bit energy this convergence happens quite rapidly for any kind of realistic device.
Why does switching barriers imply that electrical potential energy is probably being converted to heat? I don’t see how that follows at all.
Where else is the energy going to go?
What is “the energy” that has to go somewhere? As you recognize, there’s nothing that says it costs energy to change the shape of a potential well. I’m genuinely not sure what energy you’re talking about here. Is it electrical potential energy spent polarizing a medium?
I think what I’m saying is standard in how people analyze power costs of switching in transistors, see e.g. this physics.se post.
Yeah, that’s pretty standard. The ultimate efficiency limit for a semiconductor field-effect transistor is bounded by the 60 mV/dec subthreshold swing, and modern tiny transistors have to deal with all sorts of problems like leakage current which make it difficult to even reach that limit.
Unclear to me that semiconductor field-effect transistors have anything to do with neurons, but I don’t know how neurons work, so my confusion is more likely a state of my mind than a state of the world.
I don’t think transistors have too much to do with neurons beyond the abstract observation that neurons most likely store information by establishing gradients of potential energy. When the stored information needs to be updated, that means some gradients have to get moved around, and if I had to imagine how this works inside a cell it would probably involve some kind of proton pump operating across a membrane or something like that. That’s going to be functionally pretty similar to a capacitor, and discharging & recharging it probably carries similar free energy costs.
I think what I don’t understand is why you’re defaulting to the assumption that the brain has a way to store and update information that’s much more efficient than what we’re able to do. That doesn’t sound like a state of ignorance to me; it seems like you wouldn’t hold this belief if you didn’t think there was a good reason to do so.
I think what I don’t understand is why you’re defaulting to the assumption that the brain has a way to store and update information that’s much more efficient than what we’re able to do. That doesn’t sound like a state of ignorance to me; it seems like you wouldn’t hold this belief if you didn’t think there was a good reason to do so.
It’s my assumption because our brains are AGI for ~20 W.
In contrast, many kW of GPUs are not AGI.
Therefore, it seems like brains have a way of storing and updating information that’s much more efficient than what we’re able to do.
Of course, maybe I’m wrong and it’s due to a lack of training or lack of data or lack of algorithms, rather than lack of hardware.
DNA storage is way more information dense than hard drives, for example.
It’s my assumption because our brains are AGI for ~20 W.
I think that’s probably the crux. I think the evidence that the brain is not performing that much computation is reasonably good, so I attribute the difference to algorithmic advantages the brain has, particularly ones that make the brain more data efficient relative to today’s neural networks.
The brain being more data efficient I think is hard to dispute, but of course you can argue that this is simply because the brain is doing a lot more computation internally to process the limited amount of data it does see. I’m more ready to believe that the brain has some software advantage over neural networks than to believe that it has an enormous hardware advantage.
Thanks, that’s clarifying. (And yes, I’m well aware that x → B*x is almost never injective, which is why I said it wouldn’t cause 8 bits of erasure rather than the stronger, incorrect claim of 0 bits of erasure.)
Two more points of confusion:
Why does switching barriers imply that electrical potential energy is probably being converted to heat? I don’t see how that follows at all.
To what extent do information storage requirements weigh on FLOPS requirements? It’s not obvious to me that requirements on energy barriers for long-term storage in thermodynamic equilibrium necessarily bear on transient representations of information in the midst of computations, either because the system is out of thermodynamic equilibrium or because storage times are very short
Where else is the energy going to go? Again, in an adiabatic device where you have a lot of time to discharge capacitors and such, you might be able to do everything in a way that conserves free energy. I just don’t see how that’s going to work when you’re (for example) switching transistors on and off at a high frequency. It seems to me that the only place to get rid of the electrical potential energy that quickly is to convert it into heat or radiation.
I think what I’m saying is standard in how people analyze power costs of switching in transistors, see e.g. this physics.se post. If you have a proposal for how you think the brain could actually be working to be much more energy efficient than this, I would like to see some details of it, because I’ve certainly not come across anything like that before.
The Boltzmann factor roughly gives you the steady-state distribution of the associated two-state Markov chain, so if time delays are short it’s possible this would be irrelevant. However, I think that in realistic devices the Markov chain reaches equilibrium far too quickly for you to get around the thermodynamic argument because the system is out of equilibrium.
My reasoning here is that the Boltzmann factor also gives you the odds of an electron having enough kinetic energy to cross the potential barrier upon colliding with it, so e.g. if you imagine an electron stuck in a potential well that’s O(k_B T) deep, the electron will only need to collide with one of the barriers O(1) times to escape. So the rate of convergence to equilibrium comes down to the length of the well divided by the thermal speed of the electron, which is going to be quite rapid as electrons at the Fermi level in a typical wire move at speeds comparable to 1000 km/s.
I can try to calculate exactly what you should expect the convergence time here to be for some configuration you have in mind, but I’m reasonably confident when the energies involved are comparable to the Landauer bit energy this convergence happens quite rapidly for any kind of realistic device.
What is “the energy” that has to go somewhere? As you recognize, there’s nothing that says it costs energy to change the shape of a potential well. I’m genuinely not sure what energy you’re talking about here. Is it electrical potential energy spent polarizing a medium?
Yeah, that’s pretty standard. The ultimate efficiency limit for a semiconductor field-effect transistor is bounded by the 60 mV/dec subthreshold swing, and modern tiny transistors have to deal with all sorts of problems like leakage current which make it difficult to even reach that limit.
Unclear to me that semiconductor field-effect transistors have anything to do with neurons, but I don’t know how neurons work, so my confusion is more likely a state of my mind than a state of the world.
I don’t think transistors have too much to do with neurons beyond the abstract observation that neurons most likely store information by establishing gradients of potential energy. When the stored information needs to be updated, that means some gradients have to get moved around, and if I had to imagine how this works inside a cell it would probably involve some kind of proton pump operating across a membrane or something like that. That’s going to be functionally pretty similar to a capacitor, and discharging & recharging it probably carries similar free energy costs.
I think what I don’t understand is why you’re defaulting to the assumption that the brain has a way to store and update information that’s much more efficient than what we’re able to do. That doesn’t sound like a state of ignorance to me; it seems like you wouldn’t hold this belief if you didn’t think there was a good reason to do so.
It’s my assumption because our brains are AGI for ~20 W.
In contrast, many kW of GPUs are not AGI.
Therefore, it seems like brains have a way of storing and updating information that’s much more efficient than what we’re able to do.
Of course, maybe I’m wrong and it’s due to a lack of training or lack of data or lack of algorithms, rather than lack of hardware.
DNA storage is way more information dense than hard drives, for example.
I think that’s probably the crux. I think the evidence that the brain is not performing that much computation is reasonably good, so I attribute the difference to algorithmic advantages the brain has, particularly ones that make the brain more data efficient relative to today’s neural networks.
The brain being more data efficient I think is hard to dispute, but of course you can argue that this is simply because the brain is doing a lot more computation internally to process the limited amount of data it does see. I’m more ready to believe that the brain has some software advantage over neural networks than to believe that it has an enormous hardware advantage.