Carl Feynman comments on Steering GPT-2-XL by adding an activation vector

Carl Feynman 18 May 2023 18:57 UTC
19 points
1
You write “This residual stream fraction data seems like evidence of something. We just don’t know how to put together the clues yet.” I am happy to say that there is a simple explanation—simple, at least, to those of us experienced in high-dimensional geometry. Weirdly, in spaces of high dimension, almost all vectors are almost at right angles. Your activation space has 1600 dimensions. Two randomly selected vectors in this space have an angle of between 82 and 98 degrees, 99% of the time. It’s perfectly feasible for this space to represent zillions of concepts almost at right angles to each other. This permits mixtures of those concepts to be represented as linear combinations of the vectors, without the base concepts becoming too confused.
Now, consider a random vector, w (for ‘wedding’). Set 800 of the coordinates of w to 0, producing w’. The angle between w and w’ will be 60 degrees. This is much closer than any randomly chosen non-wedding concept. This is why a substantial truncation of the wedding vector is still closer to wedding than it is to anything else.
Epistemic status: Medium strong. High-dimensional geometry is one of the things I do for my career. But I did all the calculations in my head, so there’s a 20% of my being quantitatively wrong. You can check my claims with a little algebra.
- Mart_Korz 23 Jun 2023 20:22 UTC
  2 points
  0
  Parent
  Weirdly, in spaces of high dimension, almost all vectors are almost at right angles.
  This part, I can imagine. With a fixed reference vector written as $(1, 0, 0, \dots, 0)$ , a second random vector has many dimensions that it can distribute its length along $(x_{1}, x_{2}, x_{2}, \dots, x_{d})$ while for alignment to the reference (the scalar product) only the first entry $x_{1}$ contributes.
  It’s perfectly feasible for this space to represent zillions of concepts almost at right angles to each other.
  This part I struggle with. Is there an intuitive argument for why this is possible?
  If I assume smaller angles below 60° or so, a non-rigorous argument could be:
  - each vector blocks a 30°-circle around it on the d-hypersphere^[1] (if the circles of two vectors touch, their relative angle is 60°).
  - an estimate for the blocked area could be that this is mostly a ‘flat’ (d-1)-sphere of radius $30 ° / (1 rad) = π / 6 \approx 0.6$ which has an area that scales with $A_{v e c t o r} \sim (0.6)^{d - 1}$
  - the full hypersphere has a surface area with a similar pre-factor but full radius $A \sim (1)^{d - 1}$
  - thus we can expect to fit a number of vectors $N$ that scales roughly like $N \sim A / A_{v e c t o r} \sim (\frac{1}{0.6})^{d - 1}$ which is an exponential growth in $d$ .
  For a proof, one would need to include whether it is possible to tile the surface efficiently with the $A_{v e c t o r}$ circles. This seems clearly true for tiny angles (we can stack spheres in approximately flat space just fine), but seems a lot less obvious for larger angles. For example, full orthogonality would mean 90° angles and my estimate would still give $N \sim (\frac{1}{π / 4})^{d - 1} \approx (1.27)^{d - 1}$ , an exponential estimate for the number of strictly orthogonal states although these are definitely not exponentially many.
  and a copy of that circle on the opposite end of the sphere ↩︎
  - Mart_Korz 25 Jun 2023 19:41 UTC
    4 points
    0
    Parent
    Update: I found a proof of the “exponential number of near-orthogonal vectors” in these lecture notes https://www.cs.princeton.edu/courses/archive/fall16/cos521/Lectures/lec9.pdf From my understanding, the proof uses a quantification of just how likely near-orthogonality becomes in high-dimensional spaces and derives a probability for pairwise near-orthogonality of many states.
    
    This does not quite help my intuitions, but I’ll just assume that the “it it possible to tile the surface efficiently with circles even if their size gets close to the 45° threshold” resolves to “yes, if the dimensionality is high enough”.
    
    One interesting aspect of these considerations should be that with growing dimensionality the definition of near-orthogonality can be made tighter without loosing the exponential number of vectors. This should define a natural signal-to-noise ratio for information encoded in this fashion.