It’s great to have a LessWrong post that states the relationship between expected quality and a noisy measurement of quality:
E[Quality]=0.5⋅Performance
(Why 0.5? Remember that performance is a sum of two random variables with standard deviation 1: the quality of the intervention and the noise of the trial. So when you see a performance number like 4, in expectation the quality of the intervention is 2 and the contribution from the noise of the trial (i.e. how lucky you got in the RCT) is also 2.)
We previously had a popular post on this topic, the tails come apart post, but it actually made a subtle mistake when stating this relationship. It says:
For concreteness (and granting normality), an R-square of 0.5 (corresponding to an angle of sixty degrees) means that +4SD (~1/15000) on a factor will be expected to be ‘merely’ +2SD (~1/40) in the outcome—and an R-square of 0.5 is remarkably strong in the social sciences, implying it accounts for half the variance.
The example under discussion in this quote is the same as the example in this post, where quality and noise have the same variance, and thus R^2=0.5. And superficially it seems to be stating the same thing: the expectation of quality is half the measurement.
But actually, this newer post is correct, and the older post is wrong. The key is that “Quality” and “Performance” in this post are not measured in standard deviations. Their standard deviations are 1 and √2, respectively. Elaborating on that: Quality has a variance, and standard deviation, of 1. The variance of Performance is the sum of the variances of Quality and noise, which is 2, and thus its standard deviation is √2. Now that we know their standard deviations, we can scale them to units of standard deviation, and obtain Quality (unchanged) and Performance/√2. The relationship between them is:
E[Quality]=1√2⋅Performance√2
That is equivalent to the relationship stated in this post.
More generally, notating the variables in units of standard deviation as Zx and Zy (since they are “z-scores”),
E[Zy]=ρ⋅Zx
where ρ is the correlation coefficient. So if your noisy measurement of quality is Zx standard deviations above its mean, then the expectation of quality is ρZx standard deviations above its mean. It is ρ2 that is variance explained, and is thus 1⁄2 when the signal and noise have the same variance. That’s why in the example in this post, we divide the raw performance by 2, rather than converting it to standard deviations and dividing by 2.
I think it’s important to understand the relationship between the expected value of an unknown and the value of a noisy measurement of it, so it’s nice to see a whole post about this relationship. I do think it’s worth explicitly stating the relationship on a standard deviation scale, which this post doesn’t do, but I’ve done that here in my comment.
It’s great to have a LessWrong post that states the relationship between expected quality and a noisy measurement of quality:
We previously had a popular post on this topic, the tails come apart post, but it actually made a subtle mistake when stating this relationship. It says:
The example under discussion in this quote is the same as the example in this post, where quality and noise have the same variance, and thus R^2=0.5. And superficially it seems to be stating the same thing: the expectation of quality is half the measurement.
But actually, this newer post is correct, and the older post is wrong. The key is that “Quality” and “Performance” in this post are not measured in standard deviations. Their standard deviations are 1 and √2, respectively. Elaborating on that: Quality has a variance, and standard deviation, of 1. The variance of Performance is the sum of the variances of Quality and noise, which is 2, and thus its standard deviation is √2. Now that we know their standard deviations, we can scale them to units of standard deviation, and obtain Quality (unchanged) and Performance/√2. The relationship between them is:
E[Quality]=1√2⋅Performance√2
That is equivalent to the relationship stated in this post.
More generally, notating the variables in units of standard deviation as Zx and Zy (since they are “z-scores”),
E[Zy]=ρ⋅Zx
where ρ is the correlation coefficient. So if your noisy measurement of quality is Zx standard deviations above its mean, then the expectation of quality is ρZx standard deviations above its mean. It is ρ2 that is variance explained, and is thus 1⁄2 when the signal and noise have the same variance. That’s why in the example in this post, we divide the raw performance by 2, rather than converting it to standard deviations and dividing by 2.
I think it’s important to understand the relationship between the expected value of an unknown and the value of a noisy measurement of it, so it’s nice to see a whole post about this relationship. I do think it’s worth explicitly stating the relationship on a standard deviation scale, which this post doesn’t do, but I’ve done that here in my comment.