I think the key thing is that a∧b and a∨b aren’t separately linearly represented (or really even linearly represented in some strong sense). The model “represents” these by a+b with different thresholds like this:
So, if we try to compute a∨b∧¬(a∧b) linearly then we would do a∨b corresponds to (a+b), a∧b corresponds to (a+b) and ¬ corresponds to negation, so we get (a+b)−(a+b) which clearly doesn’t do what we want!
If we could apply a relu on one side, then we could get this: (a+b)−2relu((a+b)−1). And this could work (assuming a and b are booleans which are either 0 or 1). See also Fabien’s comment which shows that you naturally get xor just using random MLPs assuming some duplication.
I think the key thing is that a∧b and a∨b aren’t separately linearly represented (or really even linearly represented in some strong sense). The model “represents” these by a+b with different thresholds like this:
So, if we try to compute a∨b∧¬(a∧b) linearly then we would do a∨b corresponds to (a+b), a∧b corresponds to (a+b) and ¬ corresponds to negation, so we get (a+b)−(a+b) which clearly doesn’t do what we want!
If we could apply a relu on one side, then we could get this: (a+b)−2relu((a+b)−1). And this could work (assuming a and b are booleans which are either 0 or 1). See also Fabien’s comment which shows that you naturally get xor just using random MLPs assuming some duplication.
More generally, just try to draw the XOR decision boundary on the above diagram!
Ah, I see now! Thanks for the clarifications!