Also, arguably, Groq’s dataflow architecture is more or less this and there wouldn’t be too much difference with Cerebras either for an on-chip NN. The problem is, the control flow you refer to has largely already been removed from GPU/TPU style accelerators and so the gains may not be that great. (The Etched.ai performance argument is not really about ‘removing unnecessary layers’, because layers like the OS/programming-language etc are already irrelevant, so much as it is about running the models in an entirely different sort of way that batches more efficiently the necessary layers, as I understand it.)
That’s Etched.ai.
Also, arguably, Groq’s dataflow architecture is more or less this and there wouldn’t be too much difference with Cerebras either for an on-chip NN. The problem is, the control flow you refer to has largely already been removed from GPU/TPU style accelerators and so the gains may not be that great. (The Etched.ai performance argument is not really about ‘removing unnecessary layers’, because layers like the OS/programming-language etc are already irrelevant, so much as it is about running the models in an entirely different sort of way that batches more efficiently the necessary layers, as I understand it.)