I notice that the ML GPUs are not the best bang-for-your-buck in this chart. I assume that researchers prefer them because they pack more ‘bang’ (FLOPS/s) in one unit, and that distributing across multiple cards has a performance penalty and/or adds complexity. How do factors like the cost of the rig (motherboard, power supply, case) and the cost of electricity play into this? Would a large cluster of more commodity GPUs be an effective research setup which just isn’t economically competitive with ML GPUs, or would it be impractical at research scale?
I believe the performance/complexity penalty generally makes large clusters of cheap consumer GPUs not viable, with memory capacity being the biggest problem. From my perspective outside looking in, it takes a lot of effort and reengineering to make many ML projects just do inference on consumer GPUs with lower memory, and even more work to make it possible to train them with numerous GPUs of low memory. And it the vast majority cases the author say it’s not even possible.
The lone exception being the consumer 3090 GPU, as a massive outlier with 24GB of memory. But in pure flops the 3080 GPU is almost equivalent to a 3090 but has only 10 GB.
I notice that the ML GPUs are not the best bang-for-your-buck in this chart. I assume that researchers prefer them because they pack more ‘bang’ (FLOPS/s) in one unit, and that distributing across multiple cards has a performance penalty and/or adds complexity. How do factors like the cost of the rig (motherboard, power supply, case) and the cost of electricity play into this? Would a large cluster of more commodity GPUs be an effective research setup which just isn’t economically competitive with ML GPUs, or would it be impractical at research scale?
I believe the performance/complexity penalty generally makes large clusters of cheap consumer GPUs not viable, with memory capacity being the biggest problem. From my perspective outside looking in, it takes a lot of effort and reengineering to make many ML projects just do inference on consumer GPUs with lower memory, and even more work to make it possible to train them with numerous GPUs of low memory. And it the vast majority cases the author say it’s not even possible.
The lone exception being the consumer 3090 GPU, as a massive outlier with 24GB of memory. But in pure flops the 3080 GPU is almost equivalent to a 3090 but has only 10 GB.