Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
...The current reported best result on the ImageNet Large Scale Visual Recognition Competition is reached by the Deep Image ensemble of traditional models Wu et al. (2015). Here we report a top-5 validation error of 4.9% (and 4.82% on the test set), which improves upon the previous best result despite using 15X fewer parameters and lower resolution receptive field. Our system exceeds the estimated accuracy of human raters according to Russakovsky et al. (2014).
… About ~3% is an optimistic estimate without my “silly errors”.
...I don’t at all intend this post to somehow take away from any of the recent results: I’m very impressed with how quickly multiple groups have improved from 6.6% down to ~5% and now also below! I did not expect to see such rapid progress. It seems that we’re now surpassing a dedicated human labeler. And imo, when we are down to 3%, we’d matching the performance of a hypothetical super-dedicated fine-grained expert human ensemble of labelers.
The record has apparently been broken again: “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” (HN, Reddit), Ioffe & Szegedy 2015:
On the human-level accuracy rate: