I trained a (plain) neural network on a couple of occasions to predict the output of the function x1⊕⋯⊕x5 where x1,…,x5 are bits and ⊕ denotes the XOR operation. The neural network was hopelessly confused despite the fact that neural networks usually do not have any trouble memorizing large quantities of random information. This time the neural network could not even memorize the truth table for XOR. While the operation (x1,…,x5)↦x1⊕⋯⊕x5 is linear over the field F2, it is quite non-linear over R. The inability for a simple neural network to learn this function indicates that neural networks are better at learning when they are not required to stray too far away from linearity.
I trained a (plain) neural network on a couple of occasions to predict the output of the function x1⊕⋯⊕x5 where x1,…,x5 are bits and ⊕ denotes the XOR operation. The neural network was hopelessly confused despite the fact that neural networks usually do not have any trouble memorizing large quantities of random information. This time the neural network could not even memorize the truth table for XOR. While the operation (x1,…,x5)↦x1⊕⋯⊕x5 is linear over the field F2, it is quite non-linear over R. The inability for a simple neural network to learn this function indicates that neural networks are better at learning when they are not required to stray too far away from linearity.