Ilio answers Why does gradient descent always work on neural networks?