Isn’t this false nowadays that everyone has multi-core GPUs?
Nope, still applies. Even if you have more cores than running threads (remember programs are multi-threaded nowadays) and your OS could just hand one or more cores over indefinitely, it’ll generally still do a regular context switch to the OS and back several times per second.
And another thing that’s not worth its own comment but puts some numbers on the fuzzy “rapidly” from the article:
It’s just that [the process switching] happens rapidly.
For Windows, that’s traditionally 100 Hz, i. e. 100 context switches per second. For Linux it’s a kernel config parameter and you can choose a bunch of options between 100 and 1000 Hz. And “system calls” (e.g. talking to other programs or hardware like network / disk / sound card) can make those switches happen much more often.
There are no processes that can run independently on every time scale. There will be many clock cycles where every core is processing, and many where some cores are waiting on some shared resource to be available. Likewise if you look at parallelization via distinct hosts—they’re CLEARLY parallel, but only until they need data or instructions from outside.
The question for this post is “how much loss is there to wait times (both context switches and i/o waits), compared with some other way of organizing”? Primarily, the optimizations that are possible are around ensuring that the units of work are the right size to minimize the overhead of unnecessary synchronization points or wait times.
Nope, still applies. Even if you have more cores than running threads (remember programs are multi-threaded nowadays) and your OS could just hand one or more cores over indefinitely, it’ll generally still do a regular context switch to the OS and back several times per second.
Interesting. But does this mean “no two tasks are ever executed truly parallel-y” or just “we have true parallel execution but nonetheless have frequent context switches?”
Nope, still applies. Even if you have more cores than running threads (remember programs are multi-threaded nowadays) and your OS could just hand one or more cores over indefinitely, it’ll generally still do a regular context switch to the OS and back several times per second.
And another thing that’s not worth its own comment but puts some numbers on the fuzzy “rapidly” from the article:
For Windows, that’s traditionally 100 Hz, i. e. 100 context switches per second. For Linux it’s a kernel config parameter and you can choose a bunch of options between 100 and 1000 Hz. And “system calls” (e.g. talking to other programs or hardware like network / disk / sound card) can make those switches happen much more often.
There are no processes that can run independently on every time scale. There will be many clock cycles where every core is processing, and many where some cores are waiting on some shared resource to be available. Likewise if you look at parallelization via distinct hosts—they’re CLEARLY parallel, but only until they need data or instructions from outside.
The question for this post is “how much loss is there to wait times (both context switches and i/o waits), compared with some other way of organizing”? Primarily, the optimizations that are possible are around ensuring that the units of work are the right size to minimize the overhead of unnecessary synchronization points or wait times.
Interesting. But does this mean “no two tasks are ever executed truly parallel-y” or just “we have true parallel execution but nonetheless have frequent context switches?”
The latter. If you have 8 or 16 cores, it’d be really sad if only one thing was happening at a time.