Matteo Capucci comments on What’s up with LLMs representing XORs of arbitrary features?

Matteo Capucci 4 Jan 2024 19:48 UTC
1 point
0
Here’s what my spidey sense is telling me: model is trying to fit as many representation as possible (IIRC this is known in mech interp) and by mere pushing apart features $a$, $b$ in a large dimensional space you end up making $a \oplus b$ linearly separable. That is, there might be a combinatorial phenomenon underlying this, which feels counterintuitive because of the large dimensions involved.