This one. As far as I know, it’s the only kind of correlation that the cosine interpretation is valid for.
If you want to verify my claim, it will help to assume that your distribution has mean 0 and variance 1, in which case the correlation between X and Y is just E[XY]. But correlation is invariant under shifting and scaling, so this is fully general.
Edit: I suppose I was imprecise when referring to the correlation between bit strings; what I mean there is simply the correlation between any pair of corresponding bits. This sort of confusion appears to be standard.
Yes, I was wondering what you meant by the correlation of a string. Also, the definition doesn’t apply to binary variables unless you somehow interpret bits as numbers, but as you say, it doesn’t matter how you do so. There’s a reason I asked what definition you were using, not the original post.
This sort of confusion appears to be standard.
Which confusion? The confusion between a random variable and sequence of iid draws from it? And what do you mean by standard?
By “this sort of confusion” I meant the ambiguity between the correlation of two bit strings and the correlations between the individual bits. By “standard” I mean that I didn’t make this up; I’ve seen other people do this.
Anyway, perhaps it’s better to focus on the more general (and less notation-abusing) example I gave.
What definition of correlation are you using?
This one. As far as I know, it’s the only kind of correlation that the cosine interpretation is valid for.
If you want to verify my claim, it will help to assume that your distribution has mean 0 and variance 1, in which case the correlation between X and Y is just E[XY]. But correlation is invariant under shifting and scaling, so this is fully general.
Edit: I suppose I was imprecise when referring to the correlation between bit strings; what I mean there is simply the correlation between any pair of corresponding bits. This sort of confusion appears to be standard.
Yes, I was wondering what you meant by the correlation of a string. Also, the definition doesn’t apply to binary variables unless you somehow interpret bits as numbers, but as you say, it doesn’t matter how you do so. There’s a reason I asked what definition you were using, not the original post.
Which confusion? The confusion between a random variable and sequence of iid draws from it? And what do you mean by standard?
By “this sort of confusion” I meant the ambiguity between the correlation of two bit strings and the correlations between the individual bits. By “standard” I mean that I didn’t make this up; I’ve seen other people do this.
Anyway, perhaps it’s better to focus on the more general (and less notation-abusing) example I gave.