Thoth Hermes comments on Open Thread—July 2023

Thoth Hermes 22 Jul 2023 13:38 UTC
13 points
4
It’s to make the computational load easier.

All neural nets can be represented as a DAG, in principle (including RNNs, by unrolling). This makes automatic differentiation nearly trivial to implement.

It’s very slow, though, if every node is a single arithmetic operation. So typically each node is made into a larger number of operations simultaneously, like matrix multiplication or convolution. This is what is normally called a “layer.” Chunking the computations this way makes It easier to load them into a GPU.

However, even these operations can still be differentiated as one formula, e.g. in the case of matrix mult. So it is still ostensibly a DAG even when it is organized into layers. (This is how IIRC libraries like PyTorch work.)