Beyond that it seems tensorflow and pytorch don’t even bother to use Strassen’s algorithm over N^3 matrix multiplication (or perhaps something Strassen-like is used in the low-level GPU circuits?).
Beyond that it seems tensorflow and pytorch don’t even bother to use Strassen’s algorithm over N^3 matrix multiplication (or perhaps something Strassen-like is used in the low-level GPU circuits?).