As a follow-up to building the model, I was looking into specialized AI hardware and I have to say that I’m very uncertain about the claimed efficiency gains. There are some parts of the AI training pipeline that could be improved with specialized hardware but others seem to be pretty close to their limits.
We intend to understand this better and publish a piece in the future but it’s currently not high on the priority list.
Also, when compared to CPUs, it’s no wonder that any parallelized hardware is 1000x more efficient. So it really depends on what the exact comparison is that the authors used.
Sorry, I’m a bit confused. I’m interpreting the 1st and 3rd paragraphs of your response as expressing opposite opinions about the claimed efficiency gains (uncertainty and confidence, respectively), so I think I’m probably misinterpreting part of your response?
By uncertainty I mean, I really don’t know, i.e. I could imagine both very high and very low gains. I didn’t want to express that I’m skeptical.
For the third paragraph, I guess it depends on what you think of as specialized hardware. If you think GPUs are specialized hardware than a gain of 1000x from CPUs to GPUs sounds very plausible to me. If you think GPUs are the baseline and specialized hardware are e.g. TPUs, then a 1000x gain sounds implausible to me.
My original answers wasn’t that clear. Does this make more sense to you?
As a follow-up to building the model, I was looking into specialized AI hardware and I have to say that I’m very uncertain about the claimed efficiency gains. There are some parts of the AI training pipeline that could be improved with specialized hardware but others seem to be pretty close to their limits.
We intend to understand this better and publish a piece in the future but it’s currently not high on the priority list.
Also, when compared to CPUs, it’s no wonder that any parallelized hardware is 1000x more efficient. So it really depends on what the exact comparison is that the authors used.
Sorry, I’m a bit confused. I’m interpreting the 1st and 3rd paragraphs of your response as expressing opposite opinions about the claimed efficiency gains (uncertainty and confidence, respectively), so I think I’m probably misinterpreting part of your response?
By uncertainty I mean, I really don’t know, i.e. I could imagine both very high and very low gains. I didn’t want to express that I’m skeptical.
For the third paragraph, I guess it depends on what you think of as specialized hardware. If you think GPUs are specialized hardware than a gain of 1000x from CPUs to GPUs sounds very plausible to me. If you think GPUs are the baseline and specialized hardware are e.g. TPUs, then a 1000x gain sounds implausible to me.
My original answers wasn’t that clear. Does this make more sense to you?
It does, thanks! (I had interpreted the claim in the paper as comparing e.g. TPUs to CPUs, since the quote mentions CPUs as the baseline.)