An actual CMOS analog circuit only has to meet those precision performance specifications because it is a design which must be fabricated and reproduced with precision over and over.
Mass production is one reason, but another reason this distinction actually is important is that they are performance characteristics of the whole system, not its constituent pieces. For both analog and digital circuits, these performance characteristics have very precise meanings.
Let’s consider flop/s for digitial circuits.
If I can do 1M flop/s, that means roughly, every second, you can give me 2 million floats, and I can multiply them together pairwise and give you 1 million results, 1 second later. I can do this over and over again every second, and the values can be arbitrarily distributed over the entire domain of floating point numbers at a particular precision.[1]
“Synaptic computations” in the brain, as you describe them, do not have any of these properties. The fact that 10^15 of them happen per second is not equivalent or comparable to 10^15 flop/s, because it is not a performance characteristic of the system as a whole.
By analogy: suppose you have some gas particles in a container, and you’d like to simulate their positions. Maybe the simulation requires 10^15 flop/s to simulate in real time, and there is provably no more efficient way to run your simulation.
Does that mean the particles themselves are doing 10^15 flop/s? No!
Saying the brain does “10^15 synaptic operations per second” is a bit like saying the particles in the gas are doing “10^15 particle operations per second”.
The fact that, in the case of the brain, the operations themselves are performing some kind of useful work that looks like a multiply-add, and that this is maybe within an OOM of some fundamental efficiency limit, doesn’t mean you can coerce the types arbitrarily to say the the brain itself is efficient as a whole.
As a less vacuous analogy, you could do a bunch of analysis on an individual CMOS gate from the 1980s, and find, perhaps, that it is “near the limit of thermodynamic efficiency” in the sense that every microjoule of energy it uses is required to make it actually work. Cooling + overclocking might let you push things a bit, but you’ll never be able to match the performance of re-designing the underlying transistors entirely at a smaller process (which often involves a LOT more than just shrinking individual transistors).
It is completely irrelevant whether it’s a ‘proper’ analog multiplication that would meet precise performance specifications in a mass produced CMOS device. All that matters here is its equivalent computational capability.
Indeed, Brains and digital circuits have completely different computational capabilities and performance characteristics. That’s kind of the whole point.
If I do this with a CPU, I might have full control over which pairs are multiplied. If I have an ASIC, the pair indices might be fixed. If I have an FPGA, they might be fixed until I reprogram it.
(If I do this with a CPU, I might have full control over which pairs are multiplied. If I have an ASIC, the pair indices might be fixed. If I have an FPGA, they might be fixed until I reprogram it.)
The only advantage of a CPU/GPU over an ASIC is that the CPU/GPU is programmable after device creation. If you know what calculation you want to perform you use an ASIC and avoid the enormous inefficiency of the CPU/GPU simulating the actual circuit you want to use. An FPGA is somewhere in between.
The brain uses active rewiring (and synapse growth/shrinkage) to physically adapt the hardware, which has the flexibility of an FPGA for the purposes of deep learning, but the efficiency of an ASIC.
As a less vacuous analogy, you could do a bunch of analysis on an individual CMOS gate from the 1980s, and find, perhaps, that it is “near the limit of thermodynamic efficiency”
Or you could make the same argument about a pile of rocks, or a GPU as I noticed earlier. The entire idea of computation is a map territory enforcement, it always requires a mapping between a logical computation and physics.
If you simply assume—as you do—that the brain isn’t computing anything useful (as equivalent to deep learning operations as I believe is overwhelming supported by the evidence), then you can always claim that, but I see no reason to pay attention whatsoever. I suspect you simply haven’t spent the requisite many thousands of hours reading the right DL and neuroscience.
The only advantage of a CPU/GPU over an ASIC is that the CPU/GPU is programmable after device creation. If you know what calculation you want to perform you use an ASIC and avoid the enormous inefficiency of the CPU/GPU simulating the actual circuit you want to use
This has a kernel of truth but it is misleading. There are plenty of algorithms that don’t naturally map to circuits, because a step of an algorithm in a circuit costs space, whereas a step of an algorithm in a programmable computer costs only those bits required to encode the task. The inefficiency of dynamic decode can be paid for with large enough algorithms. This is most obvious when considering large tasks on very small machines.
It is true that neither GPUs nor CPUs seem particularly pareto optimal for their broad set of tasks, versus a cleverer clean-sheet design, and it is also true that for any given task you could likely specialize a CPU or GPU design for it somewhat easily for at least marginal benefit, but I also think this is not the default way your comment would be interpreted.
If you simply assume—as you do—that the brain isn’t computing anything useful
I do not assume this, but I am claiming that something remains to be shown, namely, that human cognition irreducibly requires any of those 10^15 “synaptic computations”.
Showing such a thing necessarily depends on an understanding of the nature of cognition at the software / algorithms / macro-architecture level. Your original post explicitly disclaims engaging with this question, which is perfectly fine as a matter of topic choice, but you then can’t make any claims which depend on such an understanding.
Absent such an understanding, you can still make apples-to-apples comparisons about overall performance characteristics between digital and biological systems. But those _must_ be grounded in an actual precise performance metric of the system as a whole, if they are to be meaningful at all.
Component-wise analysis is not equivalent to system-wide analysis, even if your component-wise analysis is precise and backed by a bunch of neuroscience results and intuitions from artificial deep learning.
FYI for Jacob and others, I am probably not going to further engage directly with Jacob, as we seem to be mostly talking past each other, and I find his tone (“this is just nonsense”, “completely irrelevant”, “suspect you simply haven’t spent...”, etc.) and style of argument to be tiresome.
I am claiming that something remains to be shown, namely, that human cognition irreducibly requires any of those 10^15 “synaptic computations”.
Obviously it requires some of those computations, but in my ontology the question of how many is clearly a software efficiency question. The fact that an A100 can do ~1e15 low precision op/s (with many caveats/limitations) is a fact about the hardware that tells you nothing about how efficiently any specific A100 may be utilizing that potential. I claim that the brain can likewise do very roughly 1e15 synaptic ops/s, but that questions of utilization of that potential towards intelligence are likewise circuit/software efficiency questions (which I do address in some of my writing, but it is specifically out of scope for this particular question of synaptic hardware.)
Showing such a thing necessarily depends on an understanding of the nature of cognition at the software / algorithms / macro-architecture level. Your original post explicitly disclaims engaging with this question,
My original post does engage with this some in the circuit efficiency section. I draw the circuit/software distinction around architectural prior and learning algorithms (genetic/innate) vs acquired knowledge/skills (cultural).
I find his tone (“this is just nonsense”,
I used that in response to you saying “but if neurons are less repurposable and rearrangeable than transistors,”, which I do believe is actually nonsense, because neural circuits literally dynamically rewire themselves, which allows the flexibility of FPGAs (for circuit learning) combined with the efficiency of ASICs, and transistors are fixed circuits not dynamically modifiable at all.
If I was to try and steelman your position, it is simply that we can not be sure how efficiently the brain utilizes the potential of its supposed synaptic computational power.
To answer that question, I have provided some of the relevant arguments in my past writing, but at this point given the enormous success of DL (which I predicted well in advance) towards AGI and the great extent to which it has reverse engineered the brain, combined with the fact that moore’s law shrinkage is petering out and the brain remains above the efficiency of our best accelerators, entirely shifts the burden on to you to write up detailed analysis/arguments as to how you can explain these facts.
To answer that question, I have provided some of the relevant arguments in my past writing, but at this point given the enormous success of DL (which I predicted well in advance) towards AGI and the great extent to which it has reverse engineered the brain, combined with the fact that moore’s law shrinkage is petering out and the brain remains above the efficiency of our best accelerators, entirely shifts the burden on to you to write up detailed analysis/arguments as to how you can explain these facts.
I think there’s just not that much to explain, here—to me, human-level cognition just doesn’t seem that complicated or impressive in an absolute sense—it is performed by a 10W computer designed by a blind idiot god, after all.
The fact that current DL paradigm methods inspired by its functionality have so far failed to produce artificial cognition of truly comparable quality and efficiency seems more like a failure of those methods rather than a success, at least so far. I don’t expect this trend to continue in the near term (which I think we agree on), and grant you some bayes points for predicting it further in advance.
If I was to try and steelman your position, it is simply that we can not be sure how efficiently the brain utilizes the potential of its supposed synaptic computational power.
I was actually referring to the flexibility and re-arrangability at design time here. Verilog and Cadence can make more flexible use of logic gates and transistors than the brain can make of neurons during a lifetime, and the design space available to circuit designers using these tools is much wider than the one available to evolution.
Mass production is one reason, but another reason this distinction actually is important is that they are performance characteristics of the whole system, not its constituent pieces. For both analog and digital circuits, these performance characteristics have very precise meanings.
Let’s consider flop/s for digitial circuits.
If I can do 1M flop/s, that means roughly, every second, you can give me 2 million floats, and I can multiply them together pairwise and give you 1 million results, 1 second later. I can do this over and over again every second, and the values can be arbitrarily distributed over the entire domain of floating point numbers at a particular precision.[1]
“Synaptic computations” in the brain, as you describe them, do not have any of these properties. The fact that 10^15 of them happen per second is not equivalent or comparable to 10^15 flop/s, because it is not a performance characteristic of the system as a whole.
By analogy: suppose you have some gas particles in a container, and you’d like to simulate their positions. Maybe the simulation requires 10^15 flop/s to simulate in real time, and there is provably no more efficient way to run your simulation.
Does that mean the particles themselves are doing 10^15 flop/s? No!
Saying the brain does “10^15 synaptic operations per second” is a bit like saying the particles in the gas are doing “10^15 particle operations per second”.
The fact that, in the case of the brain, the operations themselves are performing some kind of useful work that looks like a multiply-add, and that this is maybe within an OOM of some fundamental efficiency limit, doesn’t mean you can coerce the types arbitrarily to say the the brain itself is efficient as a whole.
As a less vacuous analogy, you could do a bunch of analysis on an individual CMOS gate from the 1980s, and find, perhaps, that it is “near the limit of thermodynamic efficiency” in the sense that every microjoule of energy it uses is required to make it actually work. Cooling + overclocking might let you push things a bit, but you’ll never be able to match the performance of re-designing the underlying transistors entirely at a smaller process (which often involves a LOT more than just shrinking individual transistors).
Indeed, Brains and digital circuits have completely different computational capabilities and performance characteristics. That’s kind of the whole point.
If I do this with a CPU, I might have full control over which pairs are multiplied. If I have an ASIC, the pair indices might be fixed. If I have an FPGA, they might be fixed until I reprogram it.
The only advantage of a CPU/GPU over an ASIC is that the CPU/GPU is programmable after device creation. If you know what calculation you want to perform you use an ASIC and avoid the enormous inefficiency of the CPU/GPU simulating the actual circuit you want to use. An FPGA is somewhere in between.
The brain uses active rewiring (and synapse growth/shrinkage) to physically adapt the hardware, which has the flexibility of an FPGA for the purposes of deep learning, but the efficiency of an ASIC.
Or you could make the same argument about a pile of rocks, or a GPU as I noticed earlier. The entire idea of computation is a map territory enforcement, it always requires a mapping between a logical computation and physics.
If you simply assume—as you do—that the brain isn’t computing anything useful (as equivalent to deep learning operations as I believe is overwhelming supported by the evidence), then you can always claim that, but I see no reason to pay attention whatsoever. I suspect you simply haven’t spent the requisite many thousands of hours reading the right DL and neuroscience.
This has a kernel of truth but it is misleading. There are plenty of algorithms that don’t naturally map to circuits, because a step of an algorithm in a circuit costs space, whereas a step of an algorithm in a programmable computer costs only those bits required to encode the task. The inefficiency of dynamic decode can be paid for with large enough algorithms. This is most obvious when considering large tasks on very small machines.
It is true that neither GPUs nor CPUs seem particularly pareto optimal for their broad set of tasks, versus a cleverer clean-sheet design, and it is also true that for any given task you could likely specialize a CPU or GPU design for it somewhat easily for at least marginal benefit, but I also think this is not the default way your comment would be interpreted.
I do not assume this, but I am claiming that something remains to be shown, namely, that human cognition irreducibly requires any of those 10^15 “synaptic computations”.
Showing such a thing necessarily depends on an understanding of the nature of cognition at the software / algorithms / macro-architecture level. Your original post explicitly disclaims engaging with this question, which is perfectly fine as a matter of topic choice, but you then can’t make any claims which depend on such an understanding.
Absent such an understanding, you can still make apples-to-apples comparisons about overall performance characteristics between digital and biological systems. But those _must_ be grounded in an actual precise performance metric of the system as a whole, if they are to be meaningful at all.
Component-wise analysis is not equivalent to system-wide analysis, even if your component-wise analysis is precise and backed by a bunch of neuroscience results and intuitions from artificial deep learning.
FYI for Jacob and others, I am probably not going to further engage directly with Jacob, as we seem to be mostly talking past each other, and I find his tone (“this is just nonsense”, “completely irrelevant”, “suspect you simply haven’t spent...”, etc.) and style of argument to be tiresome.
Obviously it requires some of those computations, but in my ontology the question of how many is clearly a software efficiency question. The fact that an A100 can do ~1e15 low precision op/s (with many caveats/limitations) is a fact about the hardware that tells you nothing about how efficiently any specific A100 may be utilizing that potential. I claim that the brain can likewise do very roughly 1e15 synaptic ops/s, but that questions of utilization of that potential towards intelligence are likewise circuit/software efficiency questions (which I do address in some of my writing, but it is specifically out of scope for this particular question of synaptic hardware.)
My original post does engage with this some in the circuit efficiency section. I draw the circuit/software distinction around architectural prior and learning algorithms (genetic/innate) vs acquired knowledge/skills (cultural).
I used that in response to you saying “but if neurons are less repurposable and rearrangeable than transistors,”, which I do believe is actually nonsense, because neural circuits literally dynamically rewire themselves, which allows the flexibility of FPGAs (for circuit learning) combined with the efficiency of ASICs, and transistors are fixed circuits not dynamically modifiable at all.
If I was to try and steelman your position, it is simply that we can not be sure how efficiently the brain utilizes the potential of its supposed synaptic computational power.
To answer that question, I have provided some of the relevant arguments in my past writing, but at this point given the enormous success of DL (which I predicted well in advance) towards AGI and the great extent to which it has reverse engineered the brain, combined with the fact that moore’s law shrinkage is petering out and the brain remains above the efficiency of our best accelerators, entirely shifts the burden on to you to write up detailed analysis/arguments as to how you can explain these facts.
I think there’s just not that much to explain, here—to me, human-level cognition just doesn’t seem that complicated or impressive in an absolute sense—it is performed by a 10W computer designed by a blind idiot god, after all.
The fact that current DL paradigm methods inspired by its functionality have so far failed to produce artificial cognition of truly comparable quality and efficiency seems more like a failure of those methods rather than a success, at least so far. I don’t expect this trend to continue in the near term (which I think we agree on), and grant you some bayes points for predicting it further in advance.
I was actually referring to the flexibility and re-arrangability at design time here. Verilog and Cadence can make more flexible use of logic gates and transistors than the brain can make of neurons during a lifetime, and the design space available to circuit designers using these tools is much wider than the one available to evolution.