I think this is incorrect. You might imagine that CPU->GPU and GPU->TPU transitions were steps up a tall log-scale tech ladder, in the way that Moore’s-law doublings were, with many more steps still possible in theory. But this is not the case, because the metric these transitions were improving on was “fraction of transistors which are dedicated to useful compute” (as opposed to extracting parallelism from a serial instruction stream, or computing unnecessary low-order bits on overly-wide floating point). This metric has a hard upper limit, at 100%, and I don’t think there’s even one order of magnitude left between current utilization and that limit.
No, I think we mostly agree—I’d expect TPUs to be with say 4x of practically optimal for the things they do. The remaining one OOM I think is possible for non-novel tasks has more to do with specialisation, eg model-specific hardware design, and that definitely has an asymtote.
The interesting case is if we can get TPU-equivalent hardware days after designing a new architecture, instead of years after, because (IMO) 1,000x speedups over CPUs are plausible.
I think this is incorrect. You might imagine that CPU->GPU and GPU->TPU transitions were steps up a tall log-scale tech ladder, in the way that Moore’s-law doublings were, with many more steps still possible in theory. But this is not the case, because the metric these transitions were improving on was “fraction of transistors which are dedicated to useful compute” (as opposed to extracting parallelism from a serial instruction stream, or computing unnecessary low-order bits on overly-wide floating point). This metric has a hard upper limit, at 100%, and I don’t think there’s even one order of magnitude left between current utilization and that limit.
No, I think we mostly agree—I’d expect TPUs to be with say 4x of practically optimal for the things they do. The remaining one OOM I think is possible for non-novel tasks has more to do with specialisation, eg model-specific hardware design, and that definitely has an asymtote.
The interesting case is if we can get TPU-equivalent hardware days after designing a new architecture, instead of years after, because (IMO) 1,000x speedups over CPUs are plausible.