I think this isn’t the sort of post that ages well or poorly, because it isn’t topical, but I think this post turned out pretty well. It gradually builds from preliminaries that most readers have probably seen before, into some pretty counterintuitive facts that aren’t widely appreciated.
At the end of the post, I listed three questions and wrote that I hope to write about some of them soon. I never did, so I figured I’d use this review to briefly give my takes.
This comment from Fabien Roger tests some of my modeling choices for robustness, and finds that the surprising results of Part IV hold up when the noise is heavier-tailed than the signal. (I’m sure there’s more to be said here, but I probably don’t have time to do more analysis by the end of the review period.,)
My basic take is that this really is a point in favor of well-evidenced interventions, but that the best-looking speculative interventions are nevertheless better. This is because I think “speculative” here mostly refers to partial measurement rather than noisy measurement. For example, maybe you can only foresee the first-order effects of an intervention, but not the second-order effects. If the first-order effect is a (known) quantity X1 and the second-order effect is an (unknown) quantity X2, then modeling the second-order effect as zero (and thus estimating the quality of the intervention as X1) isn’t a noisy measurement; it’s a partial measurement. It’s still your best guess given the information you have.
I haven’t thought this through very much. I expect good counter-arguments and counter-counter-arguments to exist here.
No—or rather, only if the measurement is guaranteed to be exactly correct. To see this, observe that the variance of a noisy, unbiased measurement is greater than the variance of the quantity you’re trying to measure (with equality only when the noise is zero), whereas the variance of a noiseless, partial measurement is less than the variance of the quantity you’re trying to measure.
Real-world measurements are absolutely partial. They are, like, mind-bogglingly partial. This point deserves a separate post, but consider for instance the action of donating $5,000 to the Against Malaria Foundation. Maybe your measured effect from the RCT is that it’ll save one life: 50 QALYs or so. But this measurement neglects the meat-eating problem: the expected-child you’ll save will grow up to eat expected-meat from factory farms, likely causing a great amount of suffering. But then you remember: actually there’s a chance that this child will have a one eight-billionth stake in determining the future of the lightcone. Oops, actually this consideration totally dominates the previous two. Does this child have better values than the average human? Again: mind-bogglingly partial!
(The measurements are also, of course, noisy! RCTs are probably about as un-noisy as it gets: for example, making your best guess about the quality of an intervention by drawing inferences from uncontrolled macroeconomic data is much more noisy. So the answer is: generally both noisy and partial, but in some sense, much more partial than noisy—though I’m not sure how much that comparison matters.)
The lessons of this post do not generalize to partial measurements at all! This post is entirely about noisy measurements. If you’ve partially measured the quality of an intervention, estimating the un-measured part using your prior will give you an estimate of intervention quality that you know is probably wrong, but the expected value of your error is zero.
I think this isn’t the sort of post that ages well or poorly, because it isn’t topical, but I think this post turned out pretty well. It gradually builds from preliminaries that most readers have probably seen before, into some pretty counterintuitive facts that aren’t widely appreciated.
At the end of the post, I listed three questions and wrote that I hope to write about some of them soon. I never did, so I figured I’d use this review to briefly give my takes.
This comment from Fabien Roger tests some of my modeling choices for robustness, and finds that the surprising results of Part IV hold up when the noise is heavier-tailed than the signal. (I’m sure there’s more to be said here, but I probably don’t have time to do more analysis by the end of the review period.,)
My basic take is that this really is a point in favor of well-evidenced interventions, but that the best-looking speculative interventions are nevertheless better. This is because I think “speculative” here mostly refers to partial measurement rather than noisy measurement. For example, maybe you can only foresee the first-order effects of an intervention, but not the second-order effects. If the first-order effect is a (known) quantity X1 and the second-order effect is an (unknown) quantity X2, then modeling the second-order effect as zero (and thus estimating the quality of the intervention as X1) isn’t a noisy measurement; it’s a partial measurement. It’s still your best guess given the information you have.
I haven’t thought this through very much. I expect good counter-arguments and counter-counter-arguments to exist here.
No—or rather, only if the measurement is guaranteed to be exactly correct. To see this, observe that the variance of a noisy, unbiased measurement is greater than the variance of the quantity you’re trying to measure (with equality only when the noise is zero), whereas the variance of a noiseless, partial measurement is less than the variance of the quantity you’re trying to measure.
Real-world measurements are absolutely partial. They are, like, mind-bogglingly partial. This point deserves a separate post, but consider for instance the action of donating $5,000 to the Against Malaria Foundation. Maybe your measured effect from the RCT is that it’ll save one life: 50 QALYs or so. But this measurement neglects the meat-eating problem: the expected-child you’ll save will grow up to eat expected-meat from factory farms, likely causing a great amount of suffering. But then you remember: actually there’s a chance that this child will have a one eight-billionth stake in determining the future of the lightcone. Oops, actually this consideration totally dominates the previous two. Does this child have better values than the average human? Again: mind-bogglingly partial!
(The measurements are also, of course, noisy! RCTs are probably about as un-noisy as it gets: for example, making your best guess about the quality of an intervention by drawing inferences from uncontrolled macroeconomic data is much more noisy. So the answer is: generally both noisy and partial, but in some sense, much more partial than noisy—though I’m not sure how much that comparison matters.)
The lessons of this post do not generalize to partial measurements at all! This post is entirely about noisy measurements. If you’ve partially measured the quality of an intervention, estimating the un-measured part using your prior will give you an estimate of intervention quality that you know is probably wrong, but the expected value of your error is zero.