It’s a good experiment to run, but the answer is “no, the results are not similar.” From the post (the first bit of emphasis added):
I hypothesize that the reason why the method works is due to the noise-stability of deep nets. In particular, my subjective impression (from experiments) is that for random steering vectors, there is no Goldilocks value of R which leads to meaningfully different continuations. In fact, if we take random vectors with the same radius as “interesting” learned steering vectors, the random vectors typically lead to uninteresting re-phrasings of the model’s unsteered continuation, if they even lead to any changes (a fact previously observed by Turner et al. (2023))[7][8]. Thus, in some sense, learned vectors (or more generally, adapters) at the Golidlocks value of R are very special; the fact that they lead to any downstream changes at all is evidence that they place significant weight on structurally important directions in activation space[9].
It’s a good experiment to run, but the answer is “no, the results are not similar.” From the post (the first bit of emphasis added):
Thanks! I feel dumb for missing that section. Interesting that this is so different from random.