This is the wrong angle to look at this question. Efficiency is a curve. At the point desktop GPUs sit at, large changes to power result in much smaller changes to performance. Doubling the power into a top end desktop GPU would not increase its performance by anywhere near double, and similarly halving the power only marginally reduces the performance.
Are you talking about clock rates? Those haven’t changed for GPUs in a while, I’m assuming they will remain essentially fixed. Doubling the power into a desktop GPU at fixed clock rate (and ignoring dark silicon fraction) thus corresponds to doubling the transistor count (at the same transistor energy efficiency), which would double performance, power, and thermal draw all together.
This is not true. GPUs can run shader cores and RT cores at the same time, for example. The reason for dedicated hardware for AI and ray tracing is that dedicated hardware is significantly more efficient (both per transistor and per watt) at doing those tasks.
Jensen explicitly mentioned dark silicon as motivator in some presentation about the new separate FP/int paths in ampere, and I’m assuming the same probably applies at some level internally for the many paths inside tensorcores and RT cores. I am less certain about perf/power for simultaneously maxing tensorcores+RTcores+alucores+mempaths, but I’m guessing it would thermal limit and underclock to some degree.
The point is that those products are operating at a much more efficient point on the power-performance curve. Laptop NVIDIA GPUs are identical dies to their desktop dies (though not always to the same model number; a 3080 Mobile is a desktop 3070 Ti, not a desktop 3080).
Primarily through lowered clock rates or dark silicon. I ignored clock rates because they seem irrelevant for the future of Moore’s law.
Google has unusually efficient data-centers, but I’d also bet that efficiency measure isn’t for a pure GPU datacenter, which would have dramatically higher energy density and thus cooling challenges than their typical light CPU heavy storage search-optimized servers.
Clock rate is relevant. Or rather, the underlying aspects that in part determine clock rate are relevant. It is true that doubling transistor density while holding all else equal would require much more thermal output, but it’s not the only option, were thermal constraints the dominant factor.
I agree there is only so much room to be gained here, which would quickly vanish in the face of exponential trends, but this part of our debate came up in the context of whether current GPUs are already past this point. I claim they aren’t, and that being so far past the point of maximal energy efficiency is evidence of it.
Jensen explicitly mentioned dark silicon as motivator in some presentation about the new separate FP/int paths in ampere
This doesn’t make sense technically; if anything Ampere moves in the opposite direction, by making both datapaths be able to do FP simultaneously (though this is ultimately a mild effect that isn’t really relevant). To quote the GA102 whitepaper,
Most graphics workloads are composed of 32-bit floating point (FP32) operations. The Streaming Multiprocessor (SM) in the Ampere GA10x GPU Architecture has been designed to support double-speed processing for FP32 operations. In the Turing generation, each of the four SM processing blocks (also called partitions) had two primary datapaths, but only one of the two could process FP32 operations. The other datapath was limited to integer operations. GA10x includes FP32 processing on both datapaths, doubling the peak processing rate for FP32 operations. As a result, GeForce RTX 3090 delivers over 35 FP32 TFLOPS, an improvement of over 2x compared to Turing GPUs.
I briefly looked for the source for your comment and didn’t find it.
Google has unusually efficient data-centers
We are interested in the compute frontier, so this is still relevant. I don’t share the intuition that higher energy density would make cooling massively less efficient.
I was aware the 3090 had 2x FP32, but I thought that dual FP thing was specific to the GA102. Actually the GA102 just has 2x the ALU cores per SM vs the GA100.
We are interested in the compute frontier, so this is still relevant. I don’t share the intuition that higher energy density would make cooling massively less efficient.
There are efficiency transitions from passive to active, air to liquid, etc, that all depend on energy density.
Are you talking about clock rates? Those haven’t changed for GPUs in a while, I’m assuming they will remain essentially fixed. Doubling the power into a desktop GPU at fixed clock rate (and ignoring dark silicon fraction) thus corresponds to doubling the transistor count (at the same transistor energy efficiency), which would double performance, power, and thermal draw all together.
Jensen explicitly mentioned dark silicon as motivator in some presentation about the new separate FP/int paths in ampere, and I’m assuming the same probably applies at some level internally for the many paths inside tensorcores and RT cores. I am less certain about perf/power for simultaneously maxing tensorcores+RTcores+alucores+mempaths, but I’m guessing it would thermal limit and underclock to some degree.
Primarily through lowered clock rates or dark silicon. I ignored clock rates because they seem irrelevant for the future of Moore’s law.
Google has unusually efficient data-centers, but I’d also bet that efficiency measure isn’t for a pure GPU datacenter, which would have dramatically higher energy density and thus cooling challenges than their typical light CPU heavy storage search-optimized servers.
Clock rate is relevant. Or rather, the underlying aspects that in part determine clock rate are relevant. It is true that doubling transistor density while holding all else equal would require much more thermal output, but it’s not the only option, were thermal constraints the dominant factor.
I agree there is only so much room to be gained here, which would quickly vanish in the face of exponential trends, but this part of our debate came up in the context of whether current GPUs are already past this point. I claim they aren’t, and that being so far past the point of maximal energy efficiency is evidence of it.
This doesn’t make sense technically; if anything Ampere moves in the opposite direction, by making both datapaths be able to do FP simultaneously (though this is ultimately a mild effect that isn’t really relevant). To quote the GA102 whitepaper,
I briefly looked for the source for your comment and didn’t find it.
We are interested in the compute frontier, so this is still relevant. I don’t share the intuition that higher energy density would make cooling massively less efficient.
I was aware the 3090 had 2x FP32, but I thought that dual FP thing was specific to the GA102. Actually the GA102 just has 2x the ALU cores per SM vs the GA100.
There are efficiency transitions from passive to active, air to liquid, etc, that all depend on energy density.