For any given period of time, the algorithmic progress is a bigger deal for increasing performance than the degree to which compute got cheaper in the same period.
This is true, but as a picture of a past, this is underselling compute by focusing on cost of compute rather than compute itself.
-- Cost of compute improved by… less than 44x, let’s say, if we use a reasonable guess based off Moore’s law. So algo efficiency was more important than that cost per FLOP going down.
So just looking at cost of compute is somewhat misleading. Cost per FLOP went down, but the amount spent went up from just dollars on a training run to tens of thousands of dollars on a training run.
-- Algo efficiency improved 44x, if we use the OpenAI efficiency baseline for AlexNet
It is ridiculous to interpret this as some general algo efficiency improvement—it’s a specific improvement in a specific measure (flops) which doesn’t even directly translate into equivalent wall-clock time performance, and is/was already encapsulated in sparsity techniques.
There has been extremely little improvement in general algorithm efficiency, compared to hardware improvement.
Not disagreeing. Am still interested in a longer-form view of why the 44x estimate overestimates, if you’re interested in writing it (think you mentioned looking into it one time).
It’s like starting with an uncompressed image, and then compressing it farther each year using different compressors (which aren’t even the best known, as there were better compressors available known earlier or in the beginning), and then measuring the data size reduction over time and claiming it as a form of “general software efficiency improvement”. It’s nothing remotely comparable to moore’s law progress (which more generally actually improves a wide variety of software).
This is true, but as a picture of a past, this is underselling compute by focusing on cost of compute rather than compute itself.
I.e., in the period between 2012 and 2020:
-- Algo efficiency improved 44x, if we use the OpenAI efficiency baseline for AlexNet
-- Cost of compute improved by… less than 44x, let’s say, if we use a reasonable guess based off Moore’s law. So algo efficiency was more important than that cost per FLOP going down.
-- But, using EpochAI’s estimates for a 6 month doubling time, total compute per training run increased > 10,000x.
So just looking at cost of compute is somewhat misleading. Cost per FLOP went down, but the amount spent went up from just dollars on a training run to tens of thousands of dollars on a training run.
It is ridiculous to interpret this as some general algo efficiency improvement—it’s a specific improvement in a specific measure (flops) which doesn’t even directly translate into equivalent wall-clock time performance, and is/was already encapsulated in sparsity techniques.
There has been extremely little improvement in general algorithm efficiency, compared to hardware improvement.
Not disagreeing. Am still interested in a longer-form view of why the 44x estimate overestimates, if you’re interested in writing it (think you mentioned looking into it one time).
It’s like starting with an uncompressed image, and then compressing it farther each year using different compressors (which aren’t even the best known, as there were better compressors available known earlier or in the beginning), and then measuring the data size reduction over time and claiming it as a form of “general software efficiency improvement”. It’s nothing remotely comparable to moore’s law progress (which more generally actually improves a wide variety of software).