There are no processes that can run independently on every time scale. There will be many clock cycles where every core is processing, and many where some cores are waiting on some shared resource to be available. Likewise if you look at parallelization via distinct hosts—they’re CLEARLY parallel, but only until they need data or instructions from outside.
The question for this post is “how much loss is there to wait times (both context switches and i/o waits), compared with some other way of organizing”? Primarily, the optimizations that are possible are around ensuring that the units of work are the right size to minimize the overhead of unnecessary synchronization points or wait times.
There are no processes that can run independently on every time scale. There will be many clock cycles where every core is processing, and many where some cores are waiting on some shared resource to be available. Likewise if you look at parallelization via distinct hosts—they’re CLEARLY parallel, but only until they need data or instructions from outside.
The question for this post is “how much loss is there to wait times (both context switches and i/o waits), compared with some other way of organizing”? Primarily, the optimizations that are possible are around ensuring that the units of work are the right size to minimize the overhead of unnecessary synchronization points or wait times.