ryan_greenblatt comments on What’s up with LLMs representing XORs of arbitrary features?

ryan_greenblatt 5 Jan 2024 17:55 UTC
8 points
0
I think the key thing is that $a \land b$ and $a \lor b$ aren’t separately linearly represented (or really even linearly represented in some strong sense). The model “represents” these by $a + b$ with different thresholds like this:

So, if we try to compute $a \lor b \land \neg (a \land b)$ linearly then we would do $a \lor b$ corresponds to $(a + b)$ , $a \land b$ corresponds to $(a + b)$ and $\neg$ corresponds to negation, so we get $(a + b) - (a + b)$ which clearly doesn’t do what we want!

If we could apply a relu on one side, then we could get this: $(a + b) - 2 r e l u ((a + b) - 1)$ . And this could work (assuming $a$ and $b$ are booleans which are either 0 or 1). See also Fabien’s comment which shows that you naturally get xor just using random MLPs assuming some duplication.
- ryan_greenblatt 6 Jan 2024 1:17 UTC
  2 points
  0
  Parent
  More generally, just try to draw the XOR decision boundary on the above diagram!
  - c.trout 6 Jan 2024 20:44 UTC
    1 point
    0
    Parent
    Ah, I see now! Thanks for the clarifications!