Yair Halberstadt comments on I found >800 orthogonal “write code” steering vectors

Yair Halberstadt 16 Jul 2024 5:34 UTC
2 points
0

And it turns out that this algorithm works and we can find steering vectors that are orthogonal (and have ~0 cosine similarity) while having very similar effects.

Why ~0 and not exactly 0? Are these not perfectly orthogonal? If not, would it be possible to modify them slightly so they are perfectly orthogonal, then repeat, just to exclude Fabien Roger’s hypothesis?
- Jacob G-W 16 Jul 2024 5:52 UTC
  6 points
  0
  Parent
  I only included $\sim$ because we are using computers, which are discrete (so they might not be perfectly orthogonal since there is usually some numerical error). The code projects vectors into the subspace orthogonal to the previous vectors, so they should be as close to orthogonal as possible. My code asserts that the pairwise cosine similarity is $\leq 10^{- 6}$ for all the vectors I use.