Sam Marks comments on What’s up with LLMs representing XORs of arbitrary features?

Sam Marks 8 Jan 2024 14:44 UTC
LW: 2 AF: 1
0
AF
Thanks, you’re correct that my definition breaks in this case. I will say that this situation is a bit pathological for two reasons:
1. The mode of a uniform distribution doesn’t coincide with its mean.
2. The variance of the multivariate uniform distribution $U ([0, 1] \times [0, 1])$ is largest along the direction $x_{1} + x_{2}$ , which is exactly the direction which we would want to represent a AND b.
I’m not sure exactly which assumptions should be imposed to avoid pathologies like this, but maybe something of the form: we are working with boolean features $f$ whose class-conditional distributions $D (\cdot | f), D (\cdot | \neg f)$ satisfy properties like
- $D (\cdot | f), D (\cdot | \neg f)$ are unimodal, and their modes coincide with their means
- The variance of $D (\cdot | f), D (\cdot | \neg f)$ along any direction is not too large relative to the difference of the means $E_{x \sim D (\cdot | f)} (x) - E_{x \sim D (\cdot | \neg f)} (x)$
- Vivek Hebbar 8 Jan 2024 19:02 UTC
  LW: 3 AF: 2
  0
  AF Parent
  The variance of the multivariate uniform distribution $U ([0, 1] \times [0, 1])$ is largest along the direction $x_{1} + x_{2}$ , which is exactly the direction which we would want to represent a AND b.
  The variance is actually the same in all directions. One can sanity-check by integration that the variance is ¹⁄₁₂ both along the axis and along the diagonal.
  In fact, there’s nothing special about the uniform distribution here: The variance should be independent of direction for any N-dimensional joint distribution where the N constituent distributions are independent and have equal variance.^[1]
  The diagram in the post showing that “and” is linearly represented works if the features are represented discretely (so that there are exactly 4 points for 2 binary features, instead of a distribution for each combination). As soon as you start defining features with thresholds like DanielVarga did, the argument stops going through in general, and the claim can become false.
  The stuff about unimodality doesn’t seem relevant to me, and in fact seems directionally wrong.
  1. ^
    I have a not-fully-verbalized proof which I don’t have time to write out
  - Sam Marks 8 Jan 2024 21:58 UTC
    LW: 3 AF: 2
    0
    AF Parent
    Thanks, you’re totally right about the equal variance thing—I had stupidly thought that the projection of $U ([0, 1]^{2})$ onto y = x would be uniform on $[- \frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}]$ (obviously false!).
    The case of a fully discrete distribution (supported in this case on four points) seems like a very special case of a something more general, where a “more typical” special case would be something like:
    if a, b are both false, then sample from $N (0, Σ)$
    if a is true and b is false, then sample from $N (μ_{a}, Σ)$
    if a is false and b is true then sample from $N (μ_{b}, Σ)$
    if a and b are true, then sample from $N (μ_{a} + μ_{b}, Σ)$
    for some $μ_{a}, μ_{b} \in R^{n}$ and covariance matrix $Σ$ . In general, I don’t really expect the class-conditional distributions to be Gaussian, nor for the class-conditional covariances to be independent of the class. But I do expect something broadly like this, where the distributions are concentrated around their class-conditional means with probability falling off as you move further from the class-conditional mean (hence unimodality), and that the class-conditional variances are not too big relative to the distance between the clusters.
    Given that longer explanation, does the unimodality thing still seem directionally wrong?
    - Vivek Hebbar 8 Jan 2024 23:19 UTC
      LW: 1 AF: 1
      0
      AF Parent
      Oops, I misunderstood what you meant by unimodality earlier. Your comment seems broadly correct now (except for the variance thing). I would still guess that unimodality isn’t precisely the right well-behavedness desideratum, but I retract the “directionally wrong”.