All neural nets can be represented as a DAG, in principle (including RNNs, by unrolling). This makes automatic differentiation nearly trivial to implement.
It’s very slow, though, if every node is a single arithmetic operation. So typically each node is made into a larger number of operations simultaneously, like matrix multiplication or convolution. This is what is normally called a “layer.” Chunking the computations this way makes It easier to load them into a GPU.
However, even these operations can still be differentiated as one formula, e.g. in the case of matrix mult. So it is still ostensibly a DAG even when it is organized into layers. (This is how IIRC libraries like PyTorch work.)
It’s to make the computational load easier.
All neural nets can be represented as a DAG, in principle (including RNNs, by unrolling). This makes automatic differentiation nearly trivial to implement.
It’s very slow, though, if every node is a single arithmetic operation. So typically each node is made into a larger number of operations simultaneously, like matrix multiplication or convolution. This is what is normally called a “layer.” Chunking the computations this way makes It easier to load them into a GPU.
However, even these operations can still be differentiated as one formula, e.g. in the case of matrix mult. So it is still ostensibly a DAG even when it is organized into layers. (This is how IIRC libraries like PyTorch work.)