philip_b comments on Philip’s Shortform

philip_b 20 Nov 2019 17:36 UTC
5 points
In my understanding, here are the main features of deep convolutional neural networks (DCNN) that make them work really well. (Disclaimer: I am not a specialist in CNNs, I have done one masters level deep learning course, and I have worked on accelerating DCNNs for 3 months.) For each feature, I give my probability, that having this feature is an important component of DCNN success, compared to having this feature to the extent that an average non-DCNN machine learning model has it (e.g. DCNN has weight sharing, an average model doesn’t have weight sharing).
1. DCNNs heavily use transformations, which are the same for each window of the input − 95%
2. For any set of pixels of the input, large distances between pixels in the set make the DCNN model interactions between these pixels less accurately − 90% (perhaps usage of dilution in some DCNNs is a counterargument to this)
3. Large depth (together with the use of activation functions) lets us model complicated features, interactions, logic − 82%
4. Having a lot of parameters lets us model complicated features, interactions, logic − 60%
5. Given 3 and 4, SGD-like optimization works unexpectedly fast for some reason − 40%
6. Given 3 and 4, SGD-like optimization with early stopping doesn’t overfit too much for some reason − 87% (I am not sure if S in SGD is important, and how important is early stopping)
7. Given 3 and 4, ReLu-like activation function works really well (compared to, for example, sigmoid).
8. Modern deep neural network libraries are easy to use compared to the baseline of not having specific well-developed libraries − 60%
9. Deep neural networks work really fast, when using modern deep neural network libraries and modern hardware − 33%
10. DCNNs find such features in photos, which are invisible to the human eye and to most ML algorithms − 20%
11. Dropout helps reducing overfitting a lot − 25%
12. Batch normalization improve quality of the model a lot for some reason − 15%
13. Batch normalization makes the optimization much faster − 32%
14. Skip connections (or residual connections, I am not sure if there’s a difference) help a lot − 20%
Let me make it more clear how I was assigning the probabilities and why I created this list. I am trying to come up with a tensor network based machine learning model, which would have the main advantages of DCNNs, but which would not, itself, be a deep relu neural network. So I decided to make this list to see which important components my model has.