Humans (at least some) appear to be able to deal with these types of challenges given enough examples to cover the space and enough time to update models.
Given some computing network running a big VR AI sim, in theory the compute power can be used to run N AIs in parallel or one AI N times accelerated or anything in between. In practice latency and bandwidth overhead considerations will place limits on the maximum serial speedup.
If you have hardware neurons running at 10^6 times biological speed (BTW, are you aware of HICANN, a chip that today implements neurons running at 10^4 faster than biological? See also this video presentation), would it make sense to implement a time-sharing system where one set of neurons is used to implement multiple AIs running at slower speed? Wouldn’t that create unnecessary communication costs (swapping AI mind states in and out of your chips) and coordination costs among the AIs?
would it make sense to implement a time-sharing system where one set of neurons is used to implement multiple AIs running at slower speed? Wouldn’t that create unnecessary communication costs
In short, If you don’t time share, then you are storing all synaptic data on the logic chip. Thus you need vastly more logic chips to simulate your model, and thus you have more communication costs.
There are a number of tradeoffs here that differ across GPUs vs neuro ASICs like HICANN or IBM TruNorth. The analog memristor approaches, if/when they work out, will have similar tradeoffs to neuro-ASICs. (for more on that and another viewpoint see this discussion with the Knowm guy )
GPUs are von neumman machines that take advantage of the 10x or more cost difference between the per transistor cost of logic vs that of memory. Logic is roughly 10x more expensive, so it makes sense to have roughly 10x more memory bits than logic bits. ie: a GPU with 5 billion transistors might have 4 gigabytes of offchip RAM.
So on the GPU (or any von neumman), typically you are always doing time-swapping: simulating some larger circuit by swapping pieces in and out of memory.
The advantage of the neuro-ASIC is energy efficiency: synapses are stored on chip, so you don’t have to pay the price of moving data which is most of the energy cost these days. The disadvantages are threefold: you lose most of your model flexibility, storing all your data on the logic chip is vastly more expensive per synapse, and you typically lose the flexibility to compress synaptic data—even basic weight sharing is no longer possible. Unfortunately these problems combine.
Lets look at some numbers. The HICANN chip has 128k synapses in 50 mm^2, and their 8-chip reticle is thus equivalent to a mid-high end GPU in die area. That’s 1 million synapses in 400 mm^2. It can update all of those synapses at about 1 mhz—which is about 1 trillion synop-hz.
A GPU using SOTA ANN simulation code can also hit about 1 trillion synop-hz, but with much more flexibility in the tradeoff between model size and speed. In particular 1 million synapses isn’t really enough—most competitive ANNS trained today are in the 1 to 10 billion synapse range—which would cost about 1000 times more for the HICANN, because it can only store 1 million synapses per chip, vs 1 billion or more for the GPU.
IBM’s truenorth can fit more synapses on a chip − 256 million on a GPU sized chip (5 billion transistors), but it runs slower, with a similar total synop-hz throughput. The GPU solutions are just far better, overall—for now.
Apparently HICANN was designed before 2008, and uses a 180nm CMOS process, whereas modern GPUs are using 28nm. It seems to me that if neuromorphic hardware catches up in terms of economy of scale and process technology, it should be far superior in cost per neural event. And if neuromorphic hardware does win, it seems that the first AGIs could have a huge amortized cost per hour of operation, and still have a lower cost per unit of cognitive work than human workers, due to running much faster than biological brains.
It seems like this GPU vs neuromorphic question could have a large impact on how the Singularity turns out, but I haven’t seen any discussion of it until now. Do you have any other thoughts or references on this topic?
Apparently HICANN was designed before 2008, and uses a 180nm CMOS process, whereas modern GPUs are using 28nm.
That’s true, but IBM’s TrueNorth is 28 nm, with about the same transistor count as a GPU. It descends from earlier research chips on old nodes that were then scaled up to new nodes. TrueNorth can fit 256 million low-bit synapses on a chip, vs 1 million for HICANN (normalized for chip area). The 28 nm process has roughly 40x the transistor density. So my default hypothesis is that if HICANN was scaled up to 28 nm it would end up similar to TrueNorth in terms of density (although TrueNorth is wierd in that it is intentionally much slower than it could be to save energy).
It seems to me that if neuromorphic hardware catches up in terms of economy of scale and process technology, it should be far superior in cost per neural event.
I expect this in the long term, but it will depend on how the end of Moore’s Law pans out. Also, current GPU code is not yet at the limits of software simulation efficiency for ANNs, and GPU hardware is still improving rapidly. It just so happens that I am working on a new type of ANN sim engine that is 10x or more faster than current SOTA for networks of interest. My approach could eventually be hardware accelerated. There are some companies already pursuing hardware acceleration of the standard algorithms—such as Nervana, targeting similar speedup but through dedicated neural asics.
One thing I can’t stress enough is the advantage of programmeable memory for storing weights—sharing and compressing weights helps solve much of the bandwidth problems the GPU would otherwise have.
It seems like this GPU vs neuromorphic question could have a large impact on how the Singularity turns out, but I haven’t seen any discussion of it until now. Do you have any other thoughts or references on this topic?
I don’t know much it really effects outcomes—whether one uses clever hardware or clever software, the brain is probably near or on the pareto surface for statistical inference energy efficiency, and we will probably get close in the near future.
I don’t know how to deal with this myself, and I doubt whether people who claim to be able to deal with these scenarios are doing so correctly. I wrote about this in http://lesswrong.com/lw/g0w/beware_selective_nihilism/
If you have hardware neurons running at 10^6 times biological speed (BTW, are you aware of HICANN, a chip that today implements neurons running at 10^4 faster than biological? See also this video presentation), would it make sense to implement a time-sharing system where one set of neurons is used to implement multiple AIs running at slower speed? Wouldn’t that create unnecessary communication costs (swapping AI mind states in and out of your chips) and coordination costs among the AIs?
In short, If you don’t time share, then you are storing all synaptic data on the logic chip. Thus you need vastly more logic chips to simulate your model, and thus you have more communication costs.
There are a number of tradeoffs here that differ across GPUs vs neuro ASICs like HICANN or IBM TruNorth. The analog memristor approaches, if/when they work out, will have similar tradeoffs to neuro-ASICs. (for more on that and another viewpoint see this discussion with the Knowm guy )
GPUs are von neumman machines that take advantage of the 10x or more cost difference between the per transistor cost of logic vs that of memory. Logic is roughly 10x more expensive, so it makes sense to have roughly 10x more memory bits than logic bits. ie: a GPU with 5 billion transistors might have 4 gigabytes of offchip RAM.
So on the GPU (or any von neumman), typically you are always doing time-swapping: simulating some larger circuit by swapping pieces in and out of memory.
The advantage of the neuro-ASIC is energy efficiency: synapses are stored on chip, so you don’t have to pay the price of moving data which is most of the energy cost these days. The disadvantages are threefold: you lose most of your model flexibility, storing all your data on the logic chip is vastly more expensive per synapse, and you typically lose the flexibility to compress synaptic data—even basic weight sharing is no longer possible. Unfortunately these problems combine.
Lets look at some numbers. The HICANN chip has 128k synapses in 50 mm^2, and their 8-chip reticle is thus equivalent to a mid-high end GPU in die area. That’s 1 million synapses in 400 mm^2. It can update all of those synapses at about 1 mhz—which is about 1 trillion synop-hz.
A GPU using SOTA ANN simulation code can also hit about 1 trillion synop-hz, but with much more flexibility in the tradeoff between model size and speed. In particular 1 million synapses isn’t really enough—most competitive ANNS trained today are in the 1 to 10 billion synapse range—which would cost about 1000 times more for the HICANN, because it can only store 1 million synapses per chip, vs 1 billion or more for the GPU.
IBM’s truenorth can fit more synapses on a chip − 256 million on a GPU sized chip (5 billion transistors), but it runs slower, with a similar total synop-hz throughput. The GPU solutions are just far better, overall—for now.
Apparently HICANN was designed before 2008, and uses a 180nm CMOS process, whereas modern GPUs are using 28nm. It seems to me that if neuromorphic hardware catches up in terms of economy of scale and process technology, it should be far superior in cost per neural event. And if neuromorphic hardware does win, it seems that the first AGIs could have a huge amortized cost per hour of operation, and still have a lower cost per unit of cognitive work than human workers, due to running much faster than biological brains.
It seems like this GPU vs neuromorphic question could have a large impact on how the Singularity turns out, but I haven’t seen any discussion of it until now. Do you have any other thoughts or references on this topic?
That’s true, but IBM’s TrueNorth is 28 nm, with about the same transistor count as a GPU. It descends from earlier research chips on old nodes that were then scaled up to new nodes. TrueNorth can fit 256 million low-bit synapses on a chip, vs 1 million for HICANN (normalized for chip area). The 28 nm process has roughly 40x the transistor density. So my default hypothesis is that if HICANN was scaled up to 28 nm it would end up similar to TrueNorth in terms of density (although TrueNorth is wierd in that it is intentionally much slower than it could be to save energy).
I expect this in the long term, but it will depend on how the end of Moore’s Law pans out. Also, current GPU code is not yet at the limits of software simulation efficiency for ANNs, and GPU hardware is still improving rapidly. It just so happens that I am working on a new type of ANN sim engine that is 10x or more faster than current SOTA for networks of interest. My approach could eventually be hardware accelerated. There are some companies already pursuing hardware acceleration of the standard algorithms—such as Nervana, targeting similar speedup but through dedicated neural asics.
One thing I can’t stress enough is the advantage of programmeable memory for storing weights—sharing and compressing weights helps solve much of the bandwidth problems the GPU would otherwise have.
I don’t know much it really effects outcomes—whether one uses clever hardware or clever software, the brain is probably near or on the pareto surface for statistical inference energy efficiency, and we will probably get close in the near future.