Douglas_Knight comments on The longest training run

Douglas_Knight 19 Aug 2022 19:12 UTC
7 points
3
Are you assuming that electricity is free? My understanding is that the cost of silicon is small compared to the cost of electricity, if you run the chip all the time, as in this article. For example, this gpu costs $60 and consumes 300 watts = 2700 kwh/year = $270/year, at $.10/kwh. This one costs 10x and consumes 3x, so its price is not negligible, but still less than a year of operation. Plus I think the data center rule of thumb is that you should multiply electricity by 2 to account for cooling costs.
This will have a very large effect on the total compute bought, numbers which only appear in the graph. The headline numbers—the optimal times—depend mainly on the exponential form of the improvement in efficiency. If the time for the cost of silicon to be cut in half is same as the time for the amount of electricity needed to be cut in half (Moore’s law vs Koomey’s law), then you should get roughly the same answer. Koomey’s law used to be faster, but after the breakdown in Dennard scaling, it seems to be slower.
If you want a GPU-specific version of Koomey’s law, I don’t know. Does that data set of GPUs have watt ratings?