Or at least, it’s not really surprising that actual precise 8 bit analog multiplication
I’m not sure what you mean by “precise 8 bit analog multiplication”, as analog is not precise in the way digital is. When I say 8-bit analog equivalent, I am talking about an analog operation that has SNR equivalent to quantized 8-bit digital, which is near the maximum useful range for analog multiplier devices, and near the upper range of estimates of synaptic precision.
I was actually imagining some kind of analogue to an 8 bit Analog-to-digital converter. Or maybe an op amp? My analog circuits knowledge is very rusty.
But anyway, if you draw up a model of some synapses as an analog circuit with actual analog components, that actually illustrates one of my main objections pretty nicely: neurons won’t actually meet the same performance specifications of that circuit, even if they behave like or are modeled by those circuits for specific input ranges and a narrow operating regime.
An actual analog circuit has to meet precise performance specifications within a specified operating domain, whether it is comprised of an 8 bit or 2 bit ADC, a high or low quality op amp, etc.
If you draw up a circuit made out of neurons, the performance characteristics and specifications it meets will probably be much more lax. If you relax specifications for a real analog circuit in the same way, you can probably make the circuit out of much cheaper and lower-energy component pieces.
An actual CMOS analog circuit only has to meet those precision performance specifications because it is a design which must be fabricated and reproduced with precision over and over.
The brain doesn’t have that constraint, so it can to some extent learn to exploit the nuances of each specific subcircuit or device. This is almost obviously superior in terms of low level circuit noise tolerance and space and energy efficiency, and is seen by some as the ultimate endpoint of Moore’s Law—see hinton’s Forward Forward section 8 I quoted here
Regardless if you see a neurotransmitter synapse system that is using 10k carriers or whatever flooding through variadic memory-like protein gates such that deep simulations of the system indicate it is doing something similar-ish to analog multiplication with SNR equivalent to 7-bits or whatnot, and you have a bunch of other neuroscience and DL experiments justifying that interpretation, then that is probably what it is doing.
It is completely irrelevant whether it’s a ‘proper’ analog multiplication that would meet precise performance specifications in a mass produced CMOS device. All that matters here is its equivalent computational capability.
An actual CMOS analog circuit only has to meet those precision performance specifications because it is a design which must be fabricated and reproduced with precision over and over.
Mass production is one reason, but another reason this distinction actually is important is that they are performance characteristics of the whole system, not its constituent pieces. For both analog and digital circuits, these performance characteristics have very precise meanings.
Let’s consider flop/s for digitial circuits.
If I can do 1M flop/s, that means roughly, every second, you can give me 2 million floats, and I can multiply them together pairwise and give you 1 million results, 1 second later. I can do this over and over again every second, and the values can be arbitrarily distributed over the entire domain of floating point numbers at a particular precision.[1]
“Synaptic computations” in the brain, as you describe them, do not have any of these properties. The fact that 10^15 of them happen per second is not equivalent or comparable to 10^15 flop/s, because it is not a performance characteristic of the system as a whole.
By analogy: suppose you have some gas particles in a container, and you’d like to simulate their positions. Maybe the simulation requires 10^15 flop/s to simulate in real time, and there is provably no more efficient way to run your simulation.
Does that mean the particles themselves are doing 10^15 flop/s? No!
Saying the brain does “10^15 synaptic operations per second” is a bit like saying the particles in the gas are doing “10^15 particle operations per second”.
The fact that, in the case of the brain, the operations themselves are performing some kind of useful work that looks like a multiply-add, and that this is maybe within an OOM of some fundamental efficiency limit, doesn’t mean you can coerce the types arbitrarily to say the the brain itself is efficient as a whole.
As a less vacuous analogy, you could do a bunch of analysis on an individual CMOS gate from the 1980s, and find, perhaps, that it is “near the limit of thermodynamic efficiency” in the sense that every microjoule of energy it uses is required to make it actually work. Cooling + overclocking might let you push things a bit, but you’ll never be able to match the performance of re-designing the underlying transistors entirely at a smaller process (which often involves a LOT more than just shrinking individual transistors).
It is completely irrelevant whether it’s a ‘proper’ analog multiplication that would meet precise performance specifications in a mass produced CMOS device. All that matters here is its equivalent computational capability.
Indeed, Brains and digital circuits have completely different computational capabilities and performance characteristics. That’s kind of the whole point.
If I do this with a CPU, I might have full control over which pairs are multiplied. If I have an ASIC, the pair indices might be fixed. If I have an FPGA, they might be fixed until I reprogram it.
(If I do this with a CPU, I might have full control over which pairs are multiplied. If I have an ASIC, the pair indices might be fixed. If I have an FPGA, they might be fixed until I reprogram it.)
The only advantage of a CPU/GPU over an ASIC is that the CPU/GPU is programmable after device creation. If you know what calculation you want to perform you use an ASIC and avoid the enormous inefficiency of the CPU/GPU simulating the actual circuit you want to use. An FPGA is somewhere in between.
The brain uses active rewiring (and synapse growth/shrinkage) to physically adapt the hardware, which has the flexibility of an FPGA for the purposes of deep learning, but the efficiency of an ASIC.
As a less vacuous analogy, you could do a bunch of analysis on an individual CMOS gate from the 1980s, and find, perhaps, that it is “near the limit of thermodynamic efficiency”
Or you could make the same argument about a pile of rocks, or a GPU as I noticed earlier. The entire idea of computation is a map territory enforcement, it always requires a mapping between a logical computation and physics.
If you simply assume—as you do—that the brain isn’t computing anything useful (as equivalent to deep learning operations as I believe is overwhelming supported by the evidence), then you can always claim that, but I see no reason to pay attention whatsoever. I suspect you simply haven’t spent the requisite many thousands of hours reading the right DL and neuroscience.
The only advantage of a CPU/GPU over an ASIC is that the CPU/GPU is programmable after device creation. If you know what calculation you want to perform you use an ASIC and avoid the enormous inefficiency of the CPU/GPU simulating the actual circuit you want to use
This has a kernel of truth but it is misleading. There are plenty of algorithms that don’t naturally map to circuits, because a step of an algorithm in a circuit costs space, whereas a step of an algorithm in a programmable computer costs only those bits required to encode the task. The inefficiency of dynamic decode can be paid for with large enough algorithms. This is most obvious when considering large tasks on very small machines.
It is true that neither GPUs nor CPUs seem particularly pareto optimal for their broad set of tasks, versus a cleverer clean-sheet design, and it is also true that for any given task you could likely specialize a CPU or GPU design for it somewhat easily for at least marginal benefit, but I also think this is not the default way your comment would be interpreted.
If you simply assume—as you do—that the brain isn’t computing anything useful
I do not assume this, but I am claiming that something remains to be shown, namely, that human cognition irreducibly requires any of those 10^15 “synaptic computations”.
Showing such a thing necessarily depends on an understanding of the nature of cognition at the software / algorithms / macro-architecture level. Your original post explicitly disclaims engaging with this question, which is perfectly fine as a matter of topic choice, but you then can’t make any claims which depend on such an understanding.
Absent such an understanding, you can still make apples-to-apples comparisons about overall performance characteristics between digital and biological systems. But those _must_ be grounded in an actual precise performance metric of the system as a whole, if they are to be meaningful at all.
Component-wise analysis is not equivalent to system-wide analysis, even if your component-wise analysis is precise and backed by a bunch of neuroscience results and intuitions from artificial deep learning.
FYI for Jacob and others, I am probably not going to further engage directly with Jacob, as we seem to be mostly talking past each other, and I find his tone (“this is just nonsense”, “completely irrelevant”, “suspect you simply haven’t spent...”, etc.) and style of argument to be tiresome.
I am claiming that something remains to be shown, namely, that human cognition irreducibly requires any of those 10^15 “synaptic computations”.
Obviously it requires some of those computations, but in my ontology the question of how many is clearly a software efficiency question. The fact that an A100 can do ~1e15 low precision op/s (with many caveats/limitations) is a fact about the hardware that tells you nothing about how efficiently any specific A100 may be utilizing that potential. I claim that the brain can likewise do very roughly 1e15 synaptic ops/s, but that questions of utilization of that potential towards intelligence are likewise circuit/software efficiency questions (which I do address in some of my writing, but it is specifically out of scope for this particular question of synaptic hardware.)
Showing such a thing necessarily depends on an understanding of the nature of cognition at the software / algorithms / macro-architecture level. Your original post explicitly disclaims engaging with this question,
My original post does engage with this some in the circuit efficiency section. I draw the circuit/software distinction around architectural prior and learning algorithms (genetic/innate) vs acquired knowledge/skills (cultural).
I find his tone (“this is just nonsense”,
I used that in response to you saying “but if neurons are less repurposable and rearrangeable than transistors,”, which I do believe is actually nonsense, because neural circuits literally dynamically rewire themselves, which allows the flexibility of FPGAs (for circuit learning) combined with the efficiency of ASICs, and transistors are fixed circuits not dynamically modifiable at all.
If I was to try and steelman your position, it is simply that we can not be sure how efficiently the brain utilizes the potential of its supposed synaptic computational power.
To answer that question, I have provided some of the relevant arguments in my past writing, but at this point given the enormous success of DL (which I predicted well in advance) towards AGI and the great extent to which it has reverse engineered the brain, combined with the fact that moore’s law shrinkage is petering out and the brain remains above the efficiency of our best accelerators, entirely shifts the burden on to you to write up detailed analysis/arguments as to how you can explain these facts.
To answer that question, I have provided some of the relevant arguments in my past writing, but at this point given the enormous success of DL (which I predicted well in advance) towards AGI and the great extent to which it has reverse engineered the brain, combined with the fact that moore’s law shrinkage is petering out and the brain remains above the efficiency of our best accelerators, entirely shifts the burden on to you to write up detailed analysis/arguments as to how you can explain these facts.
I think there’s just not that much to explain, here—to me, human-level cognition just doesn’t seem that complicated or impressive in an absolute sense—it is performed by a 10W computer designed by a blind idiot god, after all.
The fact that current DL paradigm methods inspired by its functionality have so far failed to produce artificial cognition of truly comparable quality and efficiency seems more like a failure of those methods rather than a success, at least so far. I don’t expect this trend to continue in the near term (which I think we agree on), and grant you some bayes points for predicting it further in advance.
If I was to try and steelman your position, it is simply that we can not be sure how efficiently the brain utilizes the potential of its supposed synaptic computational power.
I was actually referring to the flexibility and re-arrangability at design time here. Verilog and Cadence can make more flexible use of logic gates and transistors than the brain can make of neurons during a lifetime, and the design space available to circuit designers using these tools is much wider than the one available to evolution.
I’m not sure what you mean by “precise 8 bit analog multiplication”, as analog is not precise in the way digital is. When I say 8-bit analog equivalent, I am talking about an analog operation that has SNR equivalent to quantized 8-bit digital, which is near the maximum useful range for analog multiplier devices, and near the upper range of estimates of synaptic precision.
I was actually imagining some kind of analogue to an 8 bit Analog-to-digital converter. Or maybe an op amp? My analog circuits knowledge is very rusty.
But anyway, if you draw up a model of some synapses as an analog circuit with actual analog components, that actually illustrates one of my main objections pretty nicely: neurons won’t actually meet the same performance specifications of that circuit, even if they behave like or are modeled by those circuits for specific input ranges and a narrow operating regime.
An actual analog circuit has to meet precise performance specifications within a specified operating domain, whether it is comprised of an 8 bit or 2 bit ADC, a high or low quality op amp, etc.
If you draw up a circuit made out of neurons, the performance characteristics and specifications it meets will probably be much more lax. If you relax specifications for a real analog circuit in the same way, you can probably make the circuit out of much cheaper and lower-energy component pieces.
An actual CMOS analog circuit only has to meet those precision performance specifications because it is a design which must be fabricated and reproduced with precision over and over.
The brain doesn’t have that constraint, so it can to some extent learn to exploit the nuances of each specific subcircuit or device. This is almost obviously superior in terms of low level circuit noise tolerance and space and energy efficiency, and is seen by some as the ultimate endpoint of Moore’s Law—see hinton’s Forward Forward section 8 I quoted here
Regardless if you see a neurotransmitter synapse system that is using 10k carriers or whatever flooding through variadic memory-like protein gates such that deep simulations of the system indicate it is doing something similar-ish to analog multiplication with SNR equivalent to 7-bits or whatnot, and you have a bunch of other neuroscience and DL experiments justifying that interpretation, then that is probably what it is doing.
It is completely irrelevant whether it’s a ‘proper’ analog multiplication that would meet precise performance specifications in a mass produced CMOS device. All that matters here is its equivalent computational capability.
Mass production is one reason, but another reason this distinction actually is important is that they are performance characteristics of the whole system, not its constituent pieces. For both analog and digital circuits, these performance characteristics have very precise meanings.
Let’s consider flop/s for digitial circuits.
If I can do 1M flop/s, that means roughly, every second, you can give me 2 million floats, and I can multiply them together pairwise and give you 1 million results, 1 second later. I can do this over and over again every second, and the values can be arbitrarily distributed over the entire domain of floating point numbers at a particular precision.[1]
“Synaptic computations” in the brain, as you describe them, do not have any of these properties. The fact that 10^15 of them happen per second is not equivalent or comparable to 10^15 flop/s, because it is not a performance characteristic of the system as a whole.
By analogy: suppose you have some gas particles in a container, and you’d like to simulate their positions. Maybe the simulation requires 10^15 flop/s to simulate in real time, and there is provably no more efficient way to run your simulation.
Does that mean the particles themselves are doing 10^15 flop/s? No!
Saying the brain does “10^15 synaptic operations per second” is a bit like saying the particles in the gas are doing “10^15 particle operations per second”.
The fact that, in the case of the brain, the operations themselves are performing some kind of useful work that looks like a multiply-add, and that this is maybe within an OOM of some fundamental efficiency limit, doesn’t mean you can coerce the types arbitrarily to say the the brain itself is efficient as a whole.
As a less vacuous analogy, you could do a bunch of analysis on an individual CMOS gate from the 1980s, and find, perhaps, that it is “near the limit of thermodynamic efficiency” in the sense that every microjoule of energy it uses is required to make it actually work. Cooling + overclocking might let you push things a bit, but you’ll never be able to match the performance of re-designing the underlying transistors entirely at a smaller process (which often involves a LOT more than just shrinking individual transistors).
Indeed, Brains and digital circuits have completely different computational capabilities and performance characteristics. That’s kind of the whole point.
If I do this with a CPU, I might have full control over which pairs are multiplied. If I have an ASIC, the pair indices might be fixed. If I have an FPGA, they might be fixed until I reprogram it.
The only advantage of a CPU/GPU over an ASIC is that the CPU/GPU is programmable after device creation. If you know what calculation you want to perform you use an ASIC and avoid the enormous inefficiency of the CPU/GPU simulating the actual circuit you want to use. An FPGA is somewhere in between.
The brain uses active rewiring (and synapse growth/shrinkage) to physically adapt the hardware, which has the flexibility of an FPGA for the purposes of deep learning, but the efficiency of an ASIC.
Or you could make the same argument about a pile of rocks, or a GPU as I noticed earlier. The entire idea of computation is a map territory enforcement, it always requires a mapping between a logical computation and physics.
If you simply assume—as you do—that the brain isn’t computing anything useful (as equivalent to deep learning operations as I believe is overwhelming supported by the evidence), then you can always claim that, but I see no reason to pay attention whatsoever. I suspect you simply haven’t spent the requisite many thousands of hours reading the right DL and neuroscience.
This has a kernel of truth but it is misleading. There are plenty of algorithms that don’t naturally map to circuits, because a step of an algorithm in a circuit costs space, whereas a step of an algorithm in a programmable computer costs only those bits required to encode the task. The inefficiency of dynamic decode can be paid for with large enough algorithms. This is most obvious when considering large tasks on very small machines.
It is true that neither GPUs nor CPUs seem particularly pareto optimal for their broad set of tasks, versus a cleverer clean-sheet design, and it is also true that for any given task you could likely specialize a CPU or GPU design for it somewhat easily for at least marginal benefit, but I also think this is not the default way your comment would be interpreted.
I do not assume this, but I am claiming that something remains to be shown, namely, that human cognition irreducibly requires any of those 10^15 “synaptic computations”.
Showing such a thing necessarily depends on an understanding of the nature of cognition at the software / algorithms / macro-architecture level. Your original post explicitly disclaims engaging with this question, which is perfectly fine as a matter of topic choice, but you then can’t make any claims which depend on such an understanding.
Absent such an understanding, you can still make apples-to-apples comparisons about overall performance characteristics between digital and biological systems. But those _must_ be grounded in an actual precise performance metric of the system as a whole, if they are to be meaningful at all.
Component-wise analysis is not equivalent to system-wide analysis, even if your component-wise analysis is precise and backed by a bunch of neuroscience results and intuitions from artificial deep learning.
FYI for Jacob and others, I am probably not going to further engage directly with Jacob, as we seem to be mostly talking past each other, and I find his tone (“this is just nonsense”, “completely irrelevant”, “suspect you simply haven’t spent...”, etc.) and style of argument to be tiresome.
Obviously it requires some of those computations, but in my ontology the question of how many is clearly a software efficiency question. The fact that an A100 can do ~1e15 low precision op/s (with many caveats/limitations) is a fact about the hardware that tells you nothing about how efficiently any specific A100 may be utilizing that potential. I claim that the brain can likewise do very roughly 1e15 synaptic ops/s, but that questions of utilization of that potential towards intelligence are likewise circuit/software efficiency questions (which I do address in some of my writing, but it is specifically out of scope for this particular question of synaptic hardware.)
My original post does engage with this some in the circuit efficiency section. I draw the circuit/software distinction around architectural prior and learning algorithms (genetic/innate) vs acquired knowledge/skills (cultural).
I used that in response to you saying “but if neurons are less repurposable and rearrangeable than transistors,”, which I do believe is actually nonsense, because neural circuits literally dynamically rewire themselves, which allows the flexibility of FPGAs (for circuit learning) combined with the efficiency of ASICs, and transistors are fixed circuits not dynamically modifiable at all.
If I was to try and steelman your position, it is simply that we can not be sure how efficiently the brain utilizes the potential of its supposed synaptic computational power.
To answer that question, I have provided some of the relevant arguments in my past writing, but at this point given the enormous success of DL (which I predicted well in advance) towards AGI and the great extent to which it has reverse engineered the brain, combined with the fact that moore’s law shrinkage is petering out and the brain remains above the efficiency of our best accelerators, entirely shifts the burden on to you to write up detailed analysis/arguments as to how you can explain these facts.
I think there’s just not that much to explain, here—to me, human-level cognition just doesn’t seem that complicated or impressive in an absolute sense—it is performed by a 10W computer designed by a blind idiot god, after all.
The fact that current DL paradigm methods inspired by its functionality have so far failed to produce artificial cognition of truly comparable quality and efficiency seems more like a failure of those methods rather than a success, at least so far. I don’t expect this trend to continue in the near term (which I think we agree on), and grant you some bayes points for predicting it further in advance.
I was actually referring to the flexibility and re-arrangability at design time here. Verilog and Cadence can make more flexible use of logic gates and transistors than the brain can make of neurons during a lifetime, and the design space available to circuit designers using these tools is much wider than the one available to evolution.