This isn’t quite “lock in”, but it’s related in the sense that an outside force shaped the field of “deep learning”.
I suspect the videogame industry, and the GPUs we’re developed for it has locked in the type of technologies we now know as deep learning. GPU’s were originally ASICs developed for playing videogames, so there are specific types of operations they were optimized to perform.
I suspect that neural network architectures that leveraged these hardware optimizations outperformed other neural networks. Conv nets and Transformers are probably evidence of this. The former leverages convolution, and the latter leverages matrix multiplication. In turn, GPUs and ASICs have been optimized to run these successful neural networks faster, with NVIDIA rolling out Tensor Cores and Google deploying their TPUs.
Looking back, it’s hard to say that this combination of hardware and software isn’t a local optima, and that if we were to redesign the whole stack from the bottom up, that the technologies with the capabilities of modern “deep learning” wouldn’t look completely different.
It’s not even clear how one could find another optimum in the space of algorithms+hardware at this point either. The current stack benefits both from open source contributions and massive economies of scale.
This isn’t quite “lock in”, but it’s related in the sense that an outside force shaped the field of “deep learning”.
I suspect the videogame industry, and the GPUs we’re developed for it has locked in the type of technologies we now know as deep learning. GPU’s were originally ASICs developed for playing videogames, so there are specific types of operations they were optimized to perform.
I suspect that neural network architectures that leveraged these hardware optimizations outperformed other neural networks. Conv nets and Transformers are probably evidence of this. The former leverages convolution, and the latter leverages matrix multiplication. In turn, GPUs and ASICs have been optimized to run these successful neural networks faster, with NVIDIA rolling out Tensor Cores and Google deploying their TPUs.
Looking back, it’s hard to say that this combination of hardware and software isn’t a local optima, and that if we were to redesign the whole stack from the bottom up, that the technologies with the capabilities of modern “deep learning” wouldn’t look completely different.
It’s not even clear how one could find another optimum in the space of algorithms+hardware at this point either. The current stack benefits both from open source contributions and massive economies of scale.