Edit: I started out giving a somewhat confusing example so I’m just going to edit this comment to say something less confusing.
Let X be any random variable, and define Y as follows: with probability R, Y=X, and otherwise Y is independently drawn from the same distribution as X. Then X and Y are identically distributed and have correlation R.
This one. As far as I know, it’s the only kind of correlation that the cosine interpretation is valid for.
If you want to verify my claim, it will help to assume that your distribution has mean 0 and variance 1, in which case the correlation between X and Y is just E[XY]. But correlation is invariant under shifting and scaling, so this is fully general.
Edit: I suppose I was imprecise when referring to the correlation between bit strings; what I mean there is simply the correlation between any pair of corresponding bits. This sort of confusion appears to be standard.
Yes, I was wondering what you meant by the correlation of a string. Also, the definition doesn’t apply to binary variables unless you somehow interpret bits as numbers, but as you say, it doesn’t matter how you do so. There’s a reason I asked what definition you were using, not the original post.
This sort of confusion appears to be standard.
Which confusion? The confusion between a random variable and sequence of iid draws from it? And what do you mean by standard?
By “this sort of confusion” I meant the ambiguity between the correlation of two bit strings and the correlations between the individual bits. By “standard” I mean that I didn’t make this up; I’ve seen other people do this.
Anyway, perhaps it’s better to focus on the more general (and less notation-abusing) example I gave.
I haven’t come across that. Could you amplify it?
Edit: I started out giving a somewhat confusing example so I’m just going to edit this comment to say something less confusing.
Let X be any random variable, and define Y as follows: with probability R, Y=X, and otherwise Y is independently drawn from the same distribution as X. Then X and Y are identically distributed and have correlation R.
What definition of correlation are you using?
This one. As far as I know, it’s the only kind of correlation that the cosine interpretation is valid for.
If you want to verify my claim, it will help to assume that your distribution has mean 0 and variance 1, in which case the correlation between X and Y is just E[XY]. But correlation is invariant under shifting and scaling, so this is fully general.
Edit: I suppose I was imprecise when referring to the correlation between bit strings; what I mean there is simply the correlation between any pair of corresponding bits. This sort of confusion appears to be standard.
Yes, I was wondering what you meant by the correlation of a string. Also, the definition doesn’t apply to binary variables unless you somehow interpret bits as numbers, but as you say, it doesn’t matter how you do so. There’s a reason I asked what definition you were using, not the original post.
Which confusion? The confusion between a random variable and sequence of iid draws from it? And what do you mean by standard?
By “this sort of confusion” I meant the ambiguity between the correlation of two bit strings and the correlations between the individual bits. By “standard” I mean that I didn’t make this up; I’ve seen other people do this.
Anyway, perhaps it’s better to focus on the more general (and less notation-abusing) example I gave.