FactorialCode comments on Cerebras Systems unveils a record 1.2 trillion transistor chip for AI

FactorialCode 20 Aug 2019 22:27 UTC
2 points
Allow me to speculate wildly.

I don’t actually think this is going to make that big of a difference, at least for current AI research. The main reason is because I think the main hardware bottlenecks to better AI performance are performance/$ and performance/W and memory bandwidth. This is because, so far, most large scale DL algorithms have shown almost embarrassingly parallel scaling, and a good amount of time is wasted just saving and loading NN activations for the back-prop algorithm.

This technology probably won’t lead to any major performance improvements in terms of performance/$ or performance/W. Those will have already come from dedicated DL chips such as Google’s TPUs, because this essentially a really big dedicated DL chip. The major place for improvement is memory bandwidth, which according to the article, is an impressive 9PB per second, and 10,000 times than what’s on a V100 GPU, but with only 18GB of ram, that’s going to severely constrain the size of models that can be trained, so I don’t think it will be useful for training better models.

Might be good for inference though.
- avturchin 21 Aug 2019 12:17 UTC
  1 point
  Parent
  They also claim increased performance in term of energy as they eliminate useless multiplications on zero which are often in matrix multiplication.