Neural networks seem like they would benefit from high-latency clusters. If you divide the nodes up into 100 clusters during training, and you have ten layers, it might take each cluster 0.001s to process a single sample. So the processing time per cluster is maybe 100-1000 times less than the total latency, which is acceptable if you have 10,000,000 samples and can allow some weight updates to be a bit out of order. Also, if you just want the forward pass of the network, that’s the ideal case, since there are no state updates.
In general, long computations tend to be either stateless or have slowly changing state relative to the latency, so parallelism can work.
Please PM me a draft of your fighting aging article if you want to—I can read it and offer feedback