Indeed, their proposed implementation on commodity hardware, called “Sub-LInear Deep learning Engine” (SLIDE), outperforms AI-specialized hardware. Using SLIDE, Chen and colleagues can achieve 3.5 times faster training on commodity hardware (44 cores CPU) than AI-specialized hardware (Tesla V100). In collaboration with researchers from Intel, they were able to improve performance even further to a speedup of up to 7 times by exploiting advanced features of commodity hardware (Daghaghi et al., 2021). With the venture-funded startup ThirdAI, Shrivastava and colleagues are now developing an “algorithmic accelerator for training deep learning models that can achieve or even surpass GPU-level performance on commodity CPU hardware” (Bolt 2021). While the software is still closed alpha, a recent blog post announced the successful training of a 1.6 billion parameter model on CPUs. This research suggests that AI researchers’ heavy reliance on AI-specialized hardware might not be necessary and that commodity hardware can achieve equal or greater performance.
SLIDE got a lot of attention at the time, but it’s worth remembering that a 44-core Intel Xeon CPU is hella expensive, and these are recommendation models: an extremely narrow albeit commercially important niche, whose extremely sparse arrays ought to be one of the worst cases for GPUs. Nevertheless, a 1.6b parameter recommender model (which is, like MoEs, far less impressive than it sounds because most of it is the embedding) is not even that large; a large recommender would be, say, “Persia: A Hybrid System Scaling Deep Learning Based Recommenders up to 100 Trillion Parameters”, Lian et al 2021 (or DLRM or RecPipe). None of them use exclusively CPUs, all rely heavily on “AI specialized hardware”. I don’t expect ThirdAI to scale to Persia size with just commodity hardware (CPUs). When they do that, I’ll pay more attention.
GPU vs CPU aside, when you look at sparsity-exploiting hardware like Cerebras or the spiking neural net hardware, they all look very little like a CPU or indeed, anything you’d run Firefox on. Running Doom on a toaster looks easy by comparison.
Yep, I agree, SLIDE is probably a dud. Thanks for the references! And my inside view is also that current trends will probably continue and most interesting stuff will happen on AI-specialized hardware.
SLIDE got a lot of attention at the time, but it’s worth remembering that a 44-core Intel Xeon CPU is hella expensive, and these are recommendation models: an extremely narrow albeit commercially important niche, whose extremely sparse arrays ought to be one of the worst cases for GPUs. Nevertheless, a 1.6b parameter recommender model (which is, like MoEs, far less impressive than it sounds because most of it is the embedding) is not even that large; a large recommender would be, say, “Persia: A Hybrid System Scaling Deep Learning Based Recommenders up to 100 Trillion Parameters”, Lian et al 2021 (or DLRM or RecPipe). None of them use exclusively CPUs, all rely heavily on “AI specialized hardware”. I don’t expect ThirdAI to scale to Persia size with just commodity hardware (CPUs). When they do that, I’ll pay more attention.
GPU vs CPU aside, when you look at sparsity-exploiting hardware like Cerebras or the spiking neural net hardware, they all look very little like a CPU or indeed, anything you’d run Firefox on. Running Doom on a toaster looks easy by comparison.
Yep, I agree, SLIDE is probably a dud. Thanks for the references! And my inside view is also that current trends will probably continue and most interesting stuff will happen on AI-specialized hardware.