I feel like 7 years from AlexNet to the world of PyTorch, TPUs, tons of ML MOOCs, billion-parameter models, etc. is strong evidence against what you’re saying, right? Or were deep neural nets already a big and hot and active ecosystem even before AlexNet, more than I realize? (I wasn’t paying attention at the time.)
Moreover, even if not all the infrastructure of deep neural nets transfers to a new family of ML algorithms, much of it will. For example, the building up of people and money in ML, the building up of GPU / ASIC servers and the tools to use them, the normalization of the idea that it’s reasonable to invest millions of dollars to train one model and to fab ASICs tailored to a particular ML algorithm, the proliferation of expertise related to parallelization and hardware-acceleration, etc. So if it took 7 years from AlexNet to smooth turnkey industrial-scale deep neural nets and billion-parameter models and zillions of people trained to use them, then I think we can guess <7 years to get from a different family of learning algorithms to the analogous situation. Right? Or where do you disagree?
No you’re right. I think I’m updating toward thinking there’s a region of nonprosaic short-timelines universes. Overall it still seems like that region is relatively much smaller than prosaic short-timelines and nonprosaic long-timelines, though.
I feel like 7 years from AlexNet to the world of PyTorch, TPUs, tons of ML MOOCs, billion-parameter models, etc. is strong evidence against what you’re saying, right? Or were deep neural nets already a big and hot and active ecosystem even before AlexNet, more than I realize? (I wasn’t paying attention at the time.)
Moreover, even if not all the infrastructure of deep neural nets transfers to a new family of ML algorithms, much of it will. For example, the building up of people and money in ML, the building up of GPU / ASIC servers and the tools to use them, the normalization of the idea that it’s reasonable to invest millions of dollars to train one model and to fab ASICs tailored to a particular ML algorithm, the proliferation of expertise related to parallelization and hardware-acceleration, etc. So if it took 7 years from AlexNet to smooth turnkey industrial-scale deep neural nets and billion-parameter models and zillions of people trained to use them, then I think we can guess <7 years to get from a different family of learning algorithms to the analogous situation. Right? Or where do you disagree?
No you’re right. I think I’m updating toward thinking there’s a region of nonprosaic short-timelines universes. Overall it still seems like that region is relatively much smaller than prosaic short-timelines and nonprosaic long-timelines, though.