jacob_cannell comments on Fast Minds and Slow Computers

jacob_cannell 7 Feb 2011 5:55 UTC
0 points

If you think there is a reasonable chance that memristors will not be competitive with other technologies, isn’t that the same as admitting that they might not be that much of an advance?

Memristors are not essential for an artificial cortex to be built, not at all. They will give perhaps one order of magnitude performance improvement over using just transistors/capacitors if they work out as expected. A bandwidth breakthrough such as optical interconnects is more important for the massive million fold speed advantage.

Of course the first artificial cortex like systems will probably be built at much more modest speeds: realtime to just a few dozen times accelerated. The key point is that once we now how to make them there is a straight forward engineering path that leads to accelerating them massively.

Any analogies to neurons are far to imprecise to allow building some sort of silicon-analogy-brain, since analogies only work if the underlying behaviour is actually the same.

Memristors are just a building block element. You can build a cortex out of regular circuit elements as well, memristors are just better pieces (specialized legos).

The functionality of circuits is not a function of the substrate (the building blocks), it’s a function of the circuit organization itself.

Basically something along the lines of the Blue Brain project

Vaguely kindof—the Blue Brain project is the early precursor to building an artificial cortex. However it is far too detailed and too slow, it’s not a practical approach, not even close. It’s a learning project.
- j_andrew_rogers 10 Feb 2011 21:35 UTC
  0 points
  Parent
  There is a subtle point I think you are missing. The problem is not one of processing power or even bandwidth but one of topology. Increasing the link bandwidth does not solve any problems nor does increasing the operations retired per clock cycle.
  
  In parallel algorithms research, the main bottleneck is that traditional computer science assumes that the communication topology is a directly connected network—like the brain—but all real silicon systems are based on switch fabrics. For many years computer science simplified the analysis by treating these as interchangeable when they are not and the differences from an algorithm design standpoint start to become very apparent when parallelism exceeds a certain relatively low threshold.
  
  The real limitation is that humans currently have very limited ability to design parallel algorithms from the theoretical assumption of a switch fabric. There are two ways to work around this. The first involves inventing a scalable direct-connect computing architecture (not any time soon), and the second involves developing new body of computer science that scales on switch fabrics (currently a topic of research at a couple places).
  - jacob_cannell 11 Feb 2011 4:01 UTC
    3 points
    Parent
    Topology is the central ultimate scalability problem, it manifests in multiple forms such as interconnect, the memory bottleneck, and so on.
    
    If you could magically increase the memory/link bandwidth and operations retired per lock cycle to infinity that would solve the hard problems. 2D topology and the wide separation of memory and computation limit the link bandwidth and operations per clock.
    
    The brain employs massive connectivity in 3D but is still obviously not fully connected, and even it has to employ some forms of switching/routing for selective attention and other component algorithms.
    
    The general topological problem is a von neumman design on a 2D topology with separate logic and memory divided into N logic gates and M memory gates can access about sqrt(M) of it’s memory bits per clock and has similar sublinear scaling in bit ops/clock.
    
    Then factor in a minimum core logic size to support desired instruction capability and the effects of latency and you get our current designs. If you are willing to make much smaller very limited instruction set ASICS and mix them in with memory modules you can maximize the performance of a 2D design for some problems, but it’s still not amazingly better.
    
    I don’t see this as something that you can magically solve with a new body of computer science. The theoretical world does need to factor in this additional complexity, but in the real world engineers already design to the real bandwidth/link constraints.
    
    A switch fabric is a necessity with the very limited scaling you get in 2D (where memory/computation scales in 2D but transit scales in 1D). It’s a geometry issue.
    
    The ultimate solution is to start moving into 3D so you can scale link density with surface area instead of perimeter. Of course then the heat issue is magnified, but that research is already under way.
    
    A complete theoretical framework of parallel computation should be geometric, and deal with mapping abstract dimensionless algorithms onto physical 3D computational geometric networks. The 1D turing machine abstraction is a complete distraction in that sense.
    - j_andrew_rogers 11 Feb 2011 6:56 UTC
      0 points
      Parent
      You are missing the point. There are hyper-dimensional topological solutions that can be efficiently implement on vanilla silicon that obviate your argument. There is literature to support the conjecture even if there is not literature to support the implementation. Nonetheless, implementations are known to have recently been developed at places like IBM Research that have been publicly disclosed to exist (if not the design). (ObDisclosure: I developed much of the practical theory related to this domain—I’ve seen running code at scale). Just because the brain exists in three dimensions does not imply that it is a 3-dimensional data model any more than analogous things are implied on a computer.
      
      It is not an abstraction, you can implement these directly on silicon. There are very old theorems that allow the implementation of hyper-dimensional topological constructs on vanilla silicon (since the 1960s), conjectured to support massive pervasive parallelism (since the 1940s), the reduction to practice just isn’t obvious and no one is taught these things. These models scale well on mediocre switch fabrics if competently designed.
      
      Basically, you are extrapolating a “we can’t build algorithms on switch fabrics” bias improperly and without realizing you are doing it. State-of-the-art parallel computer science research is much more interesting than you are assuming. Ironically, the mathematics behind it is completely indifferent to dimensionality.
      - jacob_cannell 11 Feb 2011 7:29 UTC
        0 points
        Parent
        
        You are missing the point. There are hyper-dimensional topological solutions that can be efficiently implement on vanilla silicon that obviate your argument. There is literature to support the conjecture even if there is not literature to support the implementation.
        
        I’m totally missing how a “hyper-dimension topological solution” could get around the physical limitation of being realized on a 2D printed circuit. I guess if you use enough layers?
        
        Do you have a link to an example paper about this?
        j_andrew_rogers 11 Feb 2011 16:10 UTC
        0 points
        Parent
        It is analogous to how you can implement a hyper-cube topology on a physical network in normal 3-space, which is trivial. Doing it virtually on a switch fabric is trickier.
        
        Hyper-dimensionality is largely a human abstraction when talking about algorithms; a set of bits can be interpreted as being in however many dimensions is convenient for an algorithm at a particular point in time, which follows from fairly boring maths e.g. Morton’s theorems. The general concept of topological computation is not remarkable either, it has been around since Tarski, it just is not obvious how one reduces it to useful practice.
        
        There is no literature on what a reduction to practice would even look like but it is a bit of an open secret in the world of large-scale graph analysis that the very recent ability of a couple companies to parallelize graph analysis are based on something like this. Graph analysis scalability is closely tied to join algorithm scalability—a well-known hard-to-parallelize operation.
- endoself 7 Feb 2011 8:27 UTC
  0 points
  Parent
  
  Memristors are not essential for an artificial cortex to be built, not at all. They will give perhaps one order of magnitude performance improvement over using just transistors/capacitors if they work out as expected.
  
  The functionality of circuits is not a function of the substrate (the building blocks), it’s a function of the circuit organization itself.
  
  This is exactly my point. If memristors are cool but nonessential, why are they mentioned so prominently? You made it seem like they were more important than they are.
  
  A bandwidth breakthrough such as optical interconnects is more important for the massive million fold speed advantage.
  
  How confidant are we that this is close? What if there isn’t physically enough room to connect everything using the known methods?
  
  Vaguely kindof—the Blue Brain project is the early precursor to building an artificial cortex. However it is far too detailed and too slow, it’s not a practical approach, not even close. It’s a learning project.
  
  Well yes, obviously not in its current state. It might not be too detailed though; we don’t know how much detail is necessary.
  - jacob_cannell 8 Feb 2011 1:44 UTC
    0 points
    Parent
    
    How confidant are we that this is close? What if there isn’t physically enough room to connect everything using the known methods?
    
    Using state of the art interconnect available today you’d probably be limited to something much more modest like 100-1000x max speedup. Of course I find it highly likely that interconnect will continue to improve.
    
    It might not be too detailed though; we don’t know how much detail is necessary.
    
    It’s rather obviously too detailed from the perspective of functionality. Blue Brain is equivalent to simulating a current CPU at the molecular level. We don’t want to do that, we just want to rebuild the CPU’s algorithms in a new equivalent circuit. Massive difference.
    - endoself 8 Feb 2011 10:23 UTC
      1 point
      Parent
      
      Using state of the art interconnect available today you’d probably be limited to something much more modest like 100-1000x max speedup. Of course I find it highly likely that interconnect will continue to improve.
      
      That’s really interesting! Is there a prototype of this?
      
      It’s rather obviously too detailed from the perspective of functionality. Blue Brain is equivalent to simulating a current CPU at the molecular level. We don’t want to do that, we just want to rebuild the CPU’s algorithms in a new equivalent circuit. Massive difference.
      
      There is a difference. We know that the molecular-scale workings of a CPU don’t matter because it was designed by humans who wouldn’t be able to get the thing to work if they needed molecular precision. Evolution faces very different requirements. Intuitively, I think it is likely that some things can be optimized out, but I think it is very easy to overestimate here.
      - jacob_cannell 8 Feb 2011 20:15 UTC
        0 points
        Parent
        
        Using state of the art interconnect available today you’d probably be limited to something much more modest like 100-1000x max speedup . . That’s really interesting! Is there a prototype of this?
        
        A prototype of what? The cortex has roughly 20 billion neurons organized into perhaps a million columns. The connections follow the typical inverse power law with distance and most of the connectivity is fairly local. Assuming about 5% of the connections are long distance inter-regional, an averaged firing rate of about one spike per second and efficient encoding gives on the order of one GB/s of aggregate inter-regional bandwidth. This isn’t that much. It’s a ludicrous amount of wiring when you open up the brain—all the white matter, but each connection is very slow.
        
        So this isn’t a limiting factor for real-time simulation. The bigger limiting factor for real-time simulation is the memory bottleneck of just getting massive quantities of synaptic data onto each GPU’s local memory.
        
        But if memristors or other techniques surmount that principle memory bandwidth limitation, the interconnect eventually becomes a limitation. A 1000x speedup would equate to 1 TB/s of aggregate interconnect bandwidth. This is still reasonable for a a few hundred computers connected via the fastest current point to point links such as 100gb ethernet (10 GB/s each roughly * 100 node to node edges).
        
        It’s rather obviously too detailed from the perspective of functionality. Blue Brain is equivalent to simulating a current CPU at the molecular level. We don’t want to do that, we just want to rebuild the CPU’s algorithms in a new equivalent circuit. Massive difference.
        
        We know that the molecular-scale workings of a CPU don’t matter because it was designed by humans who wouldn’t be able to get the thing to work if they needed molecular precision. Intuitively, I think it is likely that some things can be optimized out, but I think it is very easy to overestimate here.
        
        If you are reverse engineering a circuit, you may initially simulate it at a really low level, perhaps even the molecular level, to get a good understanding of how it’s logic family and low level dynamics work. But the only point of that is to figure out what the circuit is doing. Once you figure that out you can apply those principles to build something similar.
        
        I’d say at this point we know what the cortex does at the abstract level: hierarchical bayesian inference. We even know the specific types of computations it does to approximate this—for example see the work of Poggio’s CBCL group at MIT. They have built computational models of the visual cortex that are closing in on completeness in terms of the main computations the canonical cortical circuit can perform.
        
        So we do know the principle functionality and underlying math of the cortex now. The network level organization above that is still less understood, but understanding the base level allows us to estimate the computational demands of creating a full cortex and it’s basically just what you’d expect (very roughly 1 low precision mad per synapse weight per update).
        endoself 9 Feb 2011 19:58 UTC
        2 points
        Parent
        
        A prototype of what?
        
        A computer that used the most advanced interconnects available today to be more parallel than normal computers.
        
        If you are reverse engineering a circuit, you may initially simulate it at a really low level, perhaps even the molecular level, to get a good understanding of how it’s logic family and low level dynamics work. But the only point of that is to figure out what the circuit is doing. Once you figure that out you can apply those principles to build something similar.
        
        The only reason this works is because humans built circuits. If their behaviour was too complex, we would not be able to design them to do what we want. A neuron can use arbitrarily complex calculations, because evolution’s only requirement is that it works.
        jacob_cannell 10 Feb 2011 6:10 UTC
        0 points
        Parent
        
        The only reason this works is because humans built circuits. If their behaviour was too complex, we would not be able to design them to do what we want.
        
        Quite so, but . .
        
        A neuron can use arbitrarily complex calculations, because evolution’s only requirement is that it works.
        
        Ultimately this is all we care about as well.
        
        We do simulate circuits at the lowest level now to understand functionality before we try to build it, and as our simulation capacity expands we will be able to handle increasing complex designs and move into the space of analog circuits. Digital ASICS for AGI would probably come well before that, of course.
        
        Really its a question of a funding. Our current designs have tens of billions of industry momentum to support.
        endoself 10 Feb 2011 22:33 UTC
        0 points
        Parent
        
        A neuron can use arbitrarily complex calculations, because evolution’s only requirement is that it works.
        
        Ultimately this is all we care about as well.
        
        No we have another requirement: the state of the system must separate into relevant and irrelevant variable, so that we can easily speed up the process by only relying on relevant variables. Nature does not need to work this way. It might, but we only having experience with human-made computers, so we cannot be sure how much of the information can be disregarded.