Wei Dai comments on Steelmaning AI risk critiques

Wei Dai 25 Jul 2015 4:48 UTC
12 points

the brain is already near optimal in terms of what can be done for 10 watts with any irreversible computer (this is relatively easy to show from wiring energy analysis).

Do you have a citation for this? My understanding is that biological neural networks operate far from the Landauer Limit (sorry I couldn’t find a better citation but this seems to be a common understanding), whereas we already have proposals for hardware that is near that limit.
- jacob_cannell 29 Jul 2015 15:59 UTC
  4 points
  Parent
  I should probably rephrase the brain optimality argument, as it isn’t just about energy per se. The brain is on the pareto efficiency surface—it is optimal with respect to some complex tradeoffs between area/volume, energy, and speed/latency.
  
  Energy is pretty dominant, so it’s much closer to those limits than the rest. The typical futurist understanding about the Landauer limit is not even wrong—way off, as I point out in my earlier reply below and related links.
  
  A consequence of the brain being near optimal for energy of computation for intelligence given it’s structure is that it is also near optimal in terms of intelligence per switching events.
  
  The brain computes with just around 10^14 switching events per second (10^14 synapses * 1 hz average firing rate). That is something of an upper bound for the average firing rate.1
  
  The typical synapse is very small, has a low SNR and thus is equivalent to a low bit op, and only activates maybe 25% of the time.2 We can roughly compare these minimal SNR analog ops with the high precision single bit ops that digital transistors implement. The landauer principle allows us to rate them as reasonably equivalent in computational power.
  
  So the brain computes with just 10^14 switching events per second. That is essentially miraculous. A modern GPU uses perhaps 10^18 switching events per second.
  
  So the important thing here is not just energy—but overall circuit efficiency. The brain is crazy super efficient—and as far as we can tell near optimal—in its use of computation towards intelligence.
  
  This explains why our best SOTA techniques in almost all AI are some version of brain-like ANNs (the key defining principle being search/optimization over circuit space). It predicts that the best we can do for AGI is to reverse engineer the brain. Yes eventually we will scale far beyond the brain, but that doesn’t mean that we will use radically different algorithms.
  - ESRogs 18 Aug 2015 16:47 UTC
    0 points
    Parent
    
    A consequence of the brain being near optimal for energy of computation for intelligence given its structure is that it is also near optimal in terms of intelligence per switching events.
    
    So the brain computes with just 10^14 switching events per second.
    
    What do you mean by, given its structure? Does this still leave open that a brain with some differences in organization could get more intelligence out of the same number of switching events per second?
    
    Similarly, I assume the same argument applies to all animal brains. Do you happen to have stats on the number of switching events per second for e.g. the chimpanzee?
- jacob_cannell 26 Jul 2015 23:57 UTC
  2 points
  Parent
  EDIT: see this comment and this comment on reddit for some references on circuit efficiency.
  
  Computers are circuits and thus networks/graphs. For primitive devices the switches (nodes) are huge so they use up significant energy. For advanced devices the switches are not much larger than wires, and the wire energy dominates. If you look at the cross section of a modern chip, it contains a hierarchy of metal layers of decreasing wire size, with the transistors at the bottom. The side view section of the cortex looks similar with vasculature and long distance wiring taking the place of the upper meta layers.
  
  The vast majority of the volume in both modern digital circuits and brain circuits consists of wiring. The transistors and the synapses are just tiny little things in comparison.
  
  Modern computer mem systems have a wire energy eff of around 10^-12 to 10^-13 J/bit/mm. The limit for reliable signals is perhaps only 10x better. I think the absolute limit for unreliable bits is 10^-15 or so, will check citation for that when I get home. Wire energy eff for bandwidth is not improving at all and hasn’t since the 90′s. The next big innovation is simply moving the memory closer , that’s about all we can do.
  
  The min wire energy is close to that predicted by a simple model of a molecular wire where each molecule sized 1 nm section is a switch (10^-19 to 10^-21 * 10^6 = 10^-13 to 10^-15). In reality of course it’s somewhat more complex—smaller wires actually dissipate more energy, but also require less to represent a signal.
  
  Also keep in mind that synapses are analog devices which require analog impulse inputs and outputs—they do more work than a single binary switch.
  
  So moores law is ending and we are already pretty close to the limits of wire efficiency. If you add up the wiring paths in the brain you get a similar estimate. Axons/dendrites appear to be at least as efficient as digital wires and are thus near optimal. None of this should be surprising—biological cells are energy optimal true nanocomputers. Neural circuits evolved from the bottom up—there was never a time at which they were inefficient.
  
  However, it is possible to avoid wire dissipation entirely with some reversible signal path. Optics is one route but photons and thus photonic devices are impractically large. The other option is superconducting circuits, which work in labs but also have far too many disadvantages to be practical yet. Eventually cold superconducting reversible computers could bypass energy issues, but that tech appears to be far.
  - Wei Dai 27 Jul 2015 6:18 UTC
    3 points
    Parent
    
    The other option is superconducting circuits, which work in labs but also have to many disadvantages to be practical yet. Eventually cold superconducting reversible computers could bypass energy issues, but that tech appears to be far.
    
    What about just replacing the copper wire inside a conventional CMOS chip with a superductor? It took some searching, but I managed to find a paper titled Cryogenically Cooled CMOS which talks about the benefits and feasibility of doing this. Quoting from the relevant section:
    
    If lower interconnect resistance improves performance, the use of ‘zero-resistance’ superconductors should provide the ultimate in performance improvement. Unfortunately, although performance improvements would be expected, they would not be as great as the simplistic statement above suggests. Furthermore, several technical obstacles remain before high-temperature superconductors (HTS) can be effectively integrated with VLSI technology.
    
    Actually, as we will see, the resistance of superconducting films is not truly zero, except in the limits of zero frequency or zero temperature. Nevertheless, at 77 K and 1 GHz, measurements on patterned YBa2Cu3O7-x (YBCO) films have already demonstrated surface resistances one to two orders of magnitude below those for Cu under the same conditions. Theoretical predictions for YBCO suggest four orders of magnitude would be possible. Unfortunately, good-quality (epitaxial) YBCO films grow best on perovskite substrates having high dielectric constants. Lanthanum aluminate (LaAlO3), which is a popular substrate for HTS microwave circuits, has a relative dielectric constant of 25. Assuming the same interconnect geometry, this makes all capacitances more than 6× greater than would be the case for a SiO2 dielectric. Thus, some of the low-resistance benefits of HTS films are cancelled by the high dielectric constants of their associated substrates.
    
    So it looks like there’s no fundamental reason why it couldn’t be done, just a matter of finding the right substrate material and solving other engineering problems.
    - jacob_cannell 27 Jul 2015 6:37 UTC
      3 points
      Parent
      
      What about just replacing the copper wire inside a conventional CMOS chip with a superductor?
      
      That is the type of tech I was referring to by superconducting circuits as precursor to full reversible. From what I understand, if you chill everything down then you also change resistance in the semiconductor along with all the other properties, so it probably isn’t as easy as just replacing the copper wires.
      
      A room temperature superconductor circuit breakthrough is one of the main wild cards over the next decade or so. Cryogenic cooling is pretty impractical for mainstream computing.
      
      So it looks like there’s no fundamental reason why it couldn’t be done, just a matter of finding the right substrate material and solving other engineering problems.
      
      Yeah, its just a question of timetables. If it’s decades away, we have a longer period of stalled moore’s law during which AGI will slowly surpass the brain, rather than rapidly.
      - Wei Dai 29 Jul 2015 8:26 UTC
        0 points
        Parent
        
        From what I understand, if you chill everything down then you also change resistance in the semiconductor along with all the other properties, so it probably isn’t as easy as just replacing the copper wires.
        
        From the sources I’ve read, there aren’t any major issues running CMOS at 77 K, you only run into problems at lower temperatures, less than 40 K. I guess people aren’t seriously trying this because it’s probably not much harder to go directly to full superconducting computers (i.e., with logic gates made out of superconductors as well) which offers a lot more benefits. Here is an article about a major IARPA project pursuing that. It doesn’t seem safe to assume that we’ll get AGI before we get superconducting computers. Do you disagree, if so can you explain why?
        jacob_cannell 29 Jul 2015 15:18 UTC
        0 points
        Parent
        There was similar interest in superconducting chips about a decade ago which was pretty much the same story—DARPA/IARPA spearheading research, major customer would be US intelligence.
        
        The 500 gigaflops per watt figure is about 100 times more computation/watt than on a current GPU—which is useful because it shows that about 99% of GPU energy cost is interconnect/wiring.
        
        In terms of viability and impact, it is still uncertain how much funding superconducting circuits will require to become competitive. And even if it is competitive in some markets for say the NSA, that doesn’t make it competitive for general consumer markets. Cryogenic cooling means these things will only work in very special data rooms—so the market is more niche.
        
        The bigger issue though is total cost competitiveness. GPUs are sort of balanced in that the energy cost is about half of the TCO (total cost of ownership). It is extremely unlikely that superconducting chips will be competitive in total cost of computation in the near future. All the various tradeoffs in a superconducting design and the overall newness of the tech imply lower circuit densities. Smaller market implies less research amortization and higher costs. Even if a superconducting chip used 0 energy, it will still be much more expensive and provide less ops/$.
        
        Once we run out of scope for further CPU/GPU improvements over the next decade, then the TCO budget will shift increasingly towards energy, and these types of chips will become increasing viable. So I’d estimate that the probability of impact in the next 5 years is small, but 10 years or more out it’s harder to say. To make a more viable forecast I’d need to read more on this tech and understand more about the costs of cryogenic cooling.
        
        But really roughly—the net effect of this could be to add another leg to moore’s law style growth, at least for server computation.
        V_V 29 Jul 2015 10:06 UTC
        −1 points
        Parent
        
        I guess people aren’t seriously trying this because it’s probably not much harder to go directly to full superconducting computers (i.e., with logic gates made out of superconductors as well) which offers a lot more benefits
        
        It takes energy to maintain cryogenic temperatures, probably much more than the energy that would be saved by eliminating wire resistance. If I understand correctly, the interest in superconducting circuits is mostly in using them to implement quantum computation.
        Barring room temperature superconductors, there are probably no benefits of using superconducting circuits for classical computation.
        Wei Dai 29 Jul 2015 12:26 UTC
        0 points
        Parent
        From the article I linked to:
        
        Studies indicate the technology, which uses low temperatures in the 4-10 kelvin range to enable information to be transmitted with minimal energy loss, could yield one-petaflop systems that use just 25 kW and 100 petaflop systems that operate at 200 kW, including the cryogenic cooler. Compare this to the current greenest system, the L-CSC supercomputer from the GSI Helmholtz Center, which achieved 5.27 gigaflops-per-watt on the most-recent Green500 list. If scaled linearly to an exaflop supercomputing system, it would consume about 190 megawatts (MW), still quite a bit short of DARPA targets, which range from 20MW to 67MW.
        
        ETA: 100 petaflops per 200 kW equals 500 gigaflops per watt, so it’s estimated to be about 100 times more energy efficient.
        V_V 29 Jul 2015 15:40 UTC
        −1 points
        Parent
        Ok, I guess it depends on how big your computer is, due to the square-cube law. Bigger computers would be at an advantage.
- V_V 26 Jul 2015 9:08 UTC
  1 point
  Parent
  As the efficiency of a logically irreversible computer approaches the Landauer limit, its speed must approach zero, for the same reason why as the efficiency of a heat engine approaches the Carnot limit its speed must approach zero.
  
  I don’t have an equation at hand, but I wouldn’t be surprised if it turned out that biological neurons operate close to the physical limit for their speed.
  
  EDIT:
  
  I found this Physics Stack Exchange answer about the thermodynamic efficiency of human muscles.
  - Wei Dai 26 Jul 2015 11:15 UTC
    5 points
    Parent
    Hmm… after more searching, I found this page, which says:
    
    The faster the processor runs, the larger the energy required to maintain the bit in the predefined 1 or 0 state. You can spend a lot of time arguing about a sensible value but something like the following is not too unreasonable: The Landauer switching limit at finite (GHz) clock speed:
    
    Energy to switch 1 bit > 100 k_B T ln(2)
    
    So biological neurons still don’t seem to be near the physical limit since they fire at only around 100 hz and according to my previous link dissipates millions to billions times more than k_B T ln(2).
    - jacob_cannell 27 Jul 2015 0:07 UTC
      2 points
      Parent
      A 100kT signal Is only reliable for a distance of a few nanometers. The energy cost is all in pushing signals through wires. So the synapse signal is a million times larger than 100kT to cross a distance of around 1 mm or so, which works out to 10^-13 J per synaptic event. Thus 10 watts for 10^14 synapses and a 1 hz rate. For a 100 hz rate, the average dist would need to be less.
    - V_V 27 Jul 2015 15:54 UTC
      −1 points
      Parent
      
      Energy to switch 1 bit > 100 k_B T ln(2)
      
      Not my field of expertise, but I don’t understand where this bound comes form. In this paper for short erasure cycles they find an exponential law, although they don’t give the constants (I suppose they are system-dependent).