This does not explain how thousands of neurotransmitter molecules impinging on a neuron and thousands of ions flooding into and out of cell membranes, all irreversible operations, in order to transmit one spike, could possibly be within one OOM of the thermodynamic limit on efficiency for a cognitive system (running at that temperature).
See my reply here which attempts to answer this. In short, if you accept that the synapse is doing the equivalent of all the operations involving a weight in a deep learning system (storing the weight, momentum gradient etc in minimal viable precision, multiplier for forward back and weight update, etc), then the answer is a more straightforward derivation from the requirements. If you are convinced that the synapse is only doing the equivalent of a single bit AND operation, then obviously you will reach the conclusion that it is many OOM wasteful, but tis easy to demolish any notion that is merely doing something so simple.[1]
There are of course many types of synapses which perform somewhat different computations and thus have different configurations, sizes, energy costs, etc. I am mostly referring to the energy/compute dominate cortical pyramidal synapses.
Nothing about any of those claims explains why the 10,000-fold redundancy of neurotransmitter molecules and ions being pumped in and out of the system is necessary for doing the alleged complicated stuff.
Is your point that the amount of neurotransmitter is precisely meaningful (so that spending some energy/heat on pumping one additional ion is doing on the order of a bit of “meaningful work”)?
I’m not sure what you mean precisely by “precisely meaningful”, but I do believe we actually know enough about how neural circuits and synapses work[1] such that we have some confidence that they must be doing something similar to their artificial analogs in DL systems.
So this minimally requires:
storage for a K-bit connection weight in memory
(some synapses) nonlinear decoding of B-bit incoming neural spike signal (timing based)
analog ‘multiplication’[2] of incoming B-bit neural signal by K-bit weight
weight update from local backpropagating hebbian/gradient signal or equivalent
We know from DL that K and B do not need to be very large, but the optimal are well above 1-bit, and more importantly the long term weight storage (equivalent of gradient EMA/momentum) drives most of the precision demand, as it needs to accumulate many noisy measurements over time. From DL it looks like you want around 8-bit at least for long-term weight param storage, even if you can sample down to 4-bit or a bit lower for forward/backwards passes.
So that just takes a certain amount of work, and if you map out the minimal digital circuits in a maximally efficient hypothetical single-electron tile technology you really do get something on order 1e5 minimal 1eV units or more[3]. Synapses are also efficient in the sense that they grow/shrink to physically represent larger/smaller logical weights using more/less resources in the optimal fashion.
I have also argued on the other side of this—there are some DL researchers who think the brain does many many OOM more computation than it would seem, but we can rule that out with the same analysis.
The actual synaptic operations are non-linear and more complex, but do something like the equivalent work of analog multiplication, and can’t be doing dramatically more or less.
Thanks! (I’m having a hard time following your argument as a whole, and I’m also not trying very hard / being lazy / not checking the numbers; but I appreciate your answers, and they’re at least fleshing out some kind of model that feels useful to me. )
This does not explain how thousands of neurotransmitter molecules impinging on a neuron and thousands of ions flooding into and out of cell membranes, all irreversible operations, in order to transmit one spike, could possibly be within one OOM of the thermodynamic limit on efficiency for a cognitive system (running at that temperature).
See my reply here which attempts to answer this. In short, if you accept that the synapse is doing the equivalent of all the operations involving a weight in a deep learning system (storing the weight, momentum gradient etc in minimal viable precision, multiplier for forward back and weight update, etc), then the answer is a more straightforward derivation from the requirements. If you are convinced that the synapse is only doing the equivalent of a single bit AND operation, then obviously you will reach the conclusion that it is many OOM wasteful, but tis easy to demolish any notion that is merely doing something so simple.[1]
There are of course many types of synapses which perform somewhat different computations and thus have different configurations, sizes, energy costs, etc. I am mostly referring to the energy/compute dominate cortical pyramidal synapses.
Nothing about any of those claims explains why the 10,000-fold redundancy of neurotransmitter molecules and ions being pumped in and out of the system is necessary for doing the alleged complicated stuff.
Is your point that the amount of neurotransmitter is precisely meaningful (so that spending some energy/heat on pumping one additional ion is doing on the order of a bit of “meaningful work”)?
I’m not sure what you mean precisely by “precisely meaningful”, but I do believe we actually know enough about how neural circuits and synapses work[1] such that we have some confidence that they must be doing something similar to their artificial analogs in DL systems.
So this minimally requires:
storage for a K-bit connection weight in memory
(some synapses) nonlinear decoding of B-bit incoming neural spike signal (timing based)
analog ‘multiplication’[2] of incoming B-bit neural signal by K-bit weight
weight update from local backpropagating hebbian/gradient signal or equivalent
We know from DL that K and B do not need to be very large, but the optimal are well above 1-bit, and more importantly the long term weight storage (equivalent of gradient EMA/momentum) drives most of the precision demand, as it needs to accumulate many noisy measurements over time. From DL it looks like you want around 8-bit at least for long-term weight param storage, even if you can sample down to 4-bit or a bit lower for forward/backwards passes.
So that just takes a certain amount of work, and if you map out the minimal digital circuits in a maximally efficient hypothetical single-electron tile technology you really do get something on order 1e5 minimal 1eV units or more[3]. Synapses are also efficient in the sense that they grow/shrink to physically represent larger/smaller logical weights using more/less resources in the optimal fashion.
I have also argued on the other side of this—there are some DL researchers who think the brain does many many OOM more computation than it would seem, but we can rule that out with the same analysis.
To those with the relevant background knowledge in DL, accelerator designs, and the relevant neuroscience.
The actual synaptic operations are non-linear and more complex, but do something like the equivalent work of analog multiplication, and can’t be doing dramatically more or less.
This is not easy to do either and requires knowledge of the limits of electronics.
Thanks! (I’m having a hard time following your argument as a whole, and I’m also not trying very hard / being lazy / not checking the numbers; but I appreciate your answers, and they’re at least fleshing out some kind of model that feels useful to me. )