There is a subtle point I think you are missing. The problem is not one of processing power or even bandwidth but one of topology. Increasing the link bandwidth does not solve any problems nor does increasing the operations retired per clock cycle.
In parallel algorithms research, the main bottleneck is that traditional computer science assumes that the communication topology is a directly connected network—like the brain—but all real silicon systems are based on switch fabrics. For many years computer science simplified the analysis by treating these as interchangeable when they are not and the differences from an algorithm design standpoint start to become very apparent when parallelism exceeds a certain relatively low threshold.
The real limitation is that humans currently have very limited ability to design parallel algorithms from the theoretical assumption of a switch fabric. There are two ways to work around this. The first involves inventing a scalable direct-connect computing architecture (not any time soon), and the second involves developing new body of computer science that scales on switch fabrics (currently a topic of research at a couple places).
Topology is the central ultimate scalability problem, it manifests in multiple forms such as interconnect, the memory bottleneck, and so on.
If you could magically increase the memory/link bandwidth and operations retired per lock cycle to infinity that would solve the hard problems. 2D topology and the wide separation of memory and computation limit the link bandwidth and operations per clock.
The brain employs massive connectivity in 3D but is still obviously not fully connected, and even it has to employ some forms of switching/routing for selective attention and other component algorithms.
The general topological problem is a von neumman design on a 2D topology with separate logic and memory divided into N logic gates and M memory gates can access about sqrt(M) of it’s memory bits per clock and has similar sublinear scaling in bit ops/clock.
Then factor in a minimum core logic size to support desired instruction capability and the effects of latency and you get our current designs. If you are willing to make much smaller very limited instruction set ASICS and mix them in with memory modules you can maximize the performance of a 2D design for some problems, but it’s still not amazingly better.
I don’t see this as something that you can magically solve with a new body of computer science. The theoretical world does need to factor in this additional complexity, but in the real world engineers already design to the real bandwidth/link constraints.
A switch fabric is a necessity with the very limited scaling you get in 2D (where memory/computation scales in 2D but transit scales in 1D). It’s a geometry issue.
The ultimate solution is to start moving into 3D so you can scale link density with surface area instead of perimeter. Of course then the heat issue is magnified, but that research is already under way.
A complete theoretical framework of parallel computation should be geometric, and deal with mapping abstract dimensionless algorithms onto physical 3D computational geometric networks. The 1D turing machine abstraction is a complete distraction in that sense.
You are missing the point. There are hyper-dimensional topological solutions that can be efficiently implement on vanilla silicon that obviate your argument. There is literature to support the conjecture even if there is not literature to support the implementation. Nonetheless, implementations are known to have recently been developed at places like IBM Research that have been publicly disclosed to exist (if not the design). (ObDisclosure: I developed much of the practical theory related to this domain—I’ve seen running code at scale). Just because the brain exists in three dimensions does not imply that it is a 3-dimensional data model any more than analogous things are implied on a computer.
It is not an abstraction, you can implement these directly on silicon. There are very old theorems that allow the implementation of hyper-dimensional topological constructs on vanilla silicon (since the 1960s), conjectured to support massive pervasive parallelism (since the 1940s), the reduction to practice just isn’t obvious and no one is taught these things. These models scale well on mediocre switch fabrics if competently designed.
Basically, you are extrapolating a “we can’t build algorithms on switch fabrics” bias improperly and without realizing you are doing it. State-of-the-art parallel computer science research is much more interesting than you are assuming. Ironically, the mathematics behind it is completely indifferent to dimensionality.
You are missing the point. There are hyper-dimensional topological solutions that can be efficiently implement on vanilla silicon that obviate your argument. There is literature to support the conjecture even if there is not literature to support the implementation.
I’m totally missing how a “hyper-dimension topological solution” could get around the physical limitation of being realized on a 2D printed circuit. I guess if you use enough layers?
Do you have a link to an example paper about this?
It is analogous to how you can implement a hyper-cube topology on a physical network in normal 3-space, which is trivial. Doing it virtually on a switch fabric is trickier.
Hyper-dimensionality is largely a human abstraction when talking about algorithms; a set of bits can be interpreted as being in however many dimensions is convenient for an algorithm at a particular point in time, which follows from fairly boring maths e.g. Morton’s theorems. The general concept of topological computation is not remarkable either, it has been around since Tarski, it just is not obvious how one reduces it to useful practice.
There is no literature on what a reduction to practice would even look like but it is a bit of an open secret in the world of large-scale graph analysis that the very recent ability of a couple companies to parallelize graph analysis are based on something like this. Graph analysis scalability is closely tied to join algorithm scalability—a well-known hard-to-parallelize operation.
There is a subtle point I think you are missing. The problem is not one of processing power or even bandwidth but one of topology. Increasing the link bandwidth does not solve any problems nor does increasing the operations retired per clock cycle.
In parallel algorithms research, the main bottleneck is that traditional computer science assumes that the communication topology is a directly connected network—like the brain—but all real silicon systems are based on switch fabrics. For many years computer science simplified the analysis by treating these as interchangeable when they are not and the differences from an algorithm design standpoint start to become very apparent when parallelism exceeds a certain relatively low threshold.
The real limitation is that humans currently have very limited ability to design parallel algorithms from the theoretical assumption of a switch fabric. There are two ways to work around this. The first involves inventing a scalable direct-connect computing architecture (not any time soon), and the second involves developing new body of computer science that scales on switch fabrics (currently a topic of research at a couple places).
Topology is the central ultimate scalability problem, it manifests in multiple forms such as interconnect, the memory bottleneck, and so on.
If you could magically increase the memory/link bandwidth and operations retired per lock cycle to infinity that would solve the hard problems. 2D topology and the wide separation of memory and computation limit the link bandwidth and operations per clock.
The brain employs massive connectivity in 3D but is still obviously not fully connected, and even it has to employ some forms of switching/routing for selective attention and other component algorithms.
The general topological problem is a von neumman design on a 2D topology with separate logic and memory divided into N logic gates and M memory gates can access about sqrt(M) of it’s memory bits per clock and has similar sublinear scaling in bit ops/clock.
Then factor in a minimum core logic size to support desired instruction capability and the effects of latency and you get our current designs. If you are willing to make much smaller very limited instruction set ASICS and mix them in with memory modules you can maximize the performance of a 2D design for some problems, but it’s still not amazingly better.
I don’t see this as something that you can magically solve with a new body of computer science. The theoretical world does need to factor in this additional complexity, but in the real world engineers already design to the real bandwidth/link constraints.
A switch fabric is a necessity with the very limited scaling you get in 2D (where memory/computation scales in 2D but transit scales in 1D). It’s a geometry issue.
The ultimate solution is to start moving into 3D so you can scale link density with surface area instead of perimeter. Of course then the heat issue is magnified, but that research is already under way.
A complete theoretical framework of parallel computation should be geometric, and deal with mapping abstract dimensionless algorithms onto physical 3D computational geometric networks. The 1D turing machine abstraction is a complete distraction in that sense.
You are missing the point. There are hyper-dimensional topological solutions that can be efficiently implement on vanilla silicon that obviate your argument. There is literature to support the conjecture even if there is not literature to support the implementation. Nonetheless, implementations are known to have recently been developed at places like IBM Research that have been publicly disclosed to exist (if not the design). (ObDisclosure: I developed much of the practical theory related to this domain—I’ve seen running code at scale). Just because the brain exists in three dimensions does not imply that it is a 3-dimensional data model any more than analogous things are implied on a computer.
It is not an abstraction, you can implement these directly on silicon. There are very old theorems that allow the implementation of hyper-dimensional topological constructs on vanilla silicon (since the 1960s), conjectured to support massive pervasive parallelism (since the 1940s), the reduction to practice just isn’t obvious and no one is taught these things. These models scale well on mediocre switch fabrics if competently designed.
Basically, you are extrapolating a “we can’t build algorithms on switch fabrics” bias improperly and without realizing you are doing it. State-of-the-art parallel computer science research is much more interesting than you are assuming. Ironically, the mathematics behind it is completely indifferent to dimensionality.
I’m totally missing how a “hyper-dimension topological solution” could get around the physical limitation of being realized on a 2D printed circuit. I guess if you use enough layers?
Do you have a link to an example paper about this?
It is analogous to how you can implement a hyper-cube topology on a physical network in normal 3-space, which is trivial. Doing it virtually on a switch fabric is trickier.
Hyper-dimensionality is largely a human abstraction when talking about algorithms; a set of bits can be interpreted as being in however many dimensions is convenient for an algorithm at a particular point in time, which follows from fairly boring maths e.g. Morton’s theorems. The general concept of topological computation is not remarkable either, it has been around since Tarski, it just is not obvious how one reduces it to useful practice.
There is no literature on what a reduction to practice would even look like but it is a bit of an open secret in the world of large-scale graph analysis that the very recent ability of a couple companies to parallelize graph analysis are based on something like this. Graph analysis scalability is closely tied to join algorithm scalability—a well-known hard-to-parallelize operation.