However, consider model parallelism, splitting different layers of the graph across the nodes.
Nitpick: this would be more accurately described as pipeline parallelism; the term model parallelism is ambiguous since sometimes it’s used as an umbrella term for everything other than data parallelism, but it’s typically taken to mean splitting each layer across multiple nodes.
Nitpick: this would be more accurately described as pipeline parallelism; the term model parallelism is ambiguous since sometimes it’s used as an umbrella term for everything other than data parallelism, but it’s typically taken to mean splitting each layer across multiple nodes.
Fair clarification, I’ve edited the post.