jacob_cannell comments on Open Thread, Apr. 27 - May 3, 2015

jacob_cannell 30 Apr 2015 21:53 UTC
15 points
I’ve done some rather extensive investigations into the physical limits of computation and the future of Moore’s Law style progress. Here’s the general lowdown/predictions:

Moore’s law for conventional computers is just running into some key new asymptotic limits. The big constraint is energy, which is entirely dominated now by interconnect (and to a lesser degree, passive leakage). For example, on a modern GPU it costs only about 10pJ for a flop, but it costs 30pJ just to read a float from a register, and it gows up orders of magnitude to read a float from local cache, remote cache, off-chip RAM, etc. The second constraint is the economics of shrinkage. We may already be hitting a wall around 20nm to 28nm. We can continue to make transistors smaller, but the cost per transistor is not going down so much (this effects logic transistors more than memory).

3D is the next big thing that can reduce interconnect distances, and using that plus optics for longer distances we can probably squeeze out another 10x to 30x improvement in ops/J. Nvidia and Intel are both going to use 3D RAM and optics in their next HPC parts. At that point we are getting close to the brain in terms of a limit of around 10^12 flops/J, which is a sort of natural limit for conventional computing. Low precision ops don’t actually help much unless we are willing to run at much lower clockrates, because the energy cost comes from moving data (lower clock rates reduce latency pressure which reduces register/interconnect pressure). Alternate materials (graphene etc) are a red herring and not anywhere near as important as the interconnect issue, which is completely dominate at this point.

The next big improvement would be transitioning to a superconducting circuit basis which in theory allows for moving bits across the interconnect fabric for zero energy cost. That appears to be decades away, and it would probably only make sense for cloud/supercomputer deployment where large scale cryocooling is feasible. That could get us up to 10^14 flops/J, and up to 10^18 ops/J for low precision analog ops. This tech could beat the brain in terms of energy efficiency by a factor of about 100x to 1000x or so. At that point you are at the Landauer limit.

The next steps past that will probably involve reversible computing and quantum computing. Reversible computing can reduce the energy of some types of operations arbitrarily close to zero. Quantum computing can allow for huge speedups for some specific algorithms and computations. Both of these techs appear to also require cryocooling (as reversible computing without a superconducting interconnect just doesn’t make much sense, and QC coherence works best near absolute zero). It is difficult to translate those concepts into a hard speedup figure, but it could eventually be very large—on the order of 10^6 or more.

For information storage density, DNA is close to the molecular packing limit of around ~1 bit / nm^3. A typical hard drive has a volume of around 30 cm^3, so using DNA level tech would result in roughly 10^21 bytes for an ultimate hard drive—so say 10^20 bytes to give room for the non-storage elements.
- TylerJay 4 May 2015 23:11 UTC
  0 points
  Parent
  Very informative. Thanks. I’ve heard reversible computing mentioned a few times, but have never looked into it. Any recommendations for a quick primer, or is wikipedia going to be good enough?
  - jacob_cannell 5 May 2015 19:40 UTC
    1 point
    Parent
    The info on wikipedia is ok. This MIRI interview with Mike Frank provides a good high level overview. Frank’s various publications go into more details. “Physical Limits of Computing” by M Frank in particular is pretty good.
    
    There have been a few discussions here on LW about some of the implications of reversible computing for the far future. Not all algorithms can take advantage of reversibility, but it looks like reversible simulations in general are feasible if they unwind time, and in particular monte carlo simulation algorithms could recycle entropy bits without unwinding time.
    - TylerJay 6 May 2015 2:29 UTC
      0 points
      Parent
      Thanks, I’ll check it out.