There was a critical moment in 2006(?) where Hinton and Salakhutdinov(?) proposed training Restricted Boltzmann machines unsupervised in layers, and then ‘unrolling’ the RBMs to initialize the weights in the network, and then you could do further gradient descent updates from there, because the activations and gradients wouldn’t explode or die out given that initialization. That got people to, I dunno, 6 layers instead of 3 layers or something? But it focused attention on the problem of exploding gradients as the reason why deeply layered neural nets never worked, and that kicked off the entire modern field of deep learning, more or less.
Does anyone have a good summary of the pre-alexnet history of neural nets? This comment and others about ReLUs contradict what I had been taught in masters level CS AI/ML classes (in 2018), which is also what Ngo seems to have in his model, that neural nets were mostly hardware-limited throughout their winter.
Does anyone have a good summary of the pre-alexnet history of neural nets? This comment and others about ReLUs contradict what I had been taught in masters level CS AI/ML classes (in 2018), which is also what Ngo seems to have in his model, that neural nets were mostly hardware-limited throughout their winter.