In particle physics we use blinding of variables, which is a bit distinct from what’s called blind studies in medicine, to deal with this problem. I’ll use my own thesis as an example: I was trying to measure charm mixing parameters, imaginatively denoted x and y. There were some theoretical predictions: they were expected to be less than 1%, x should be smaller in absolute magnitude than y, and they should have opposite sign; in addition there were some prior measurements that we would presumably not conflict with, within the errors.
Now, there was quite a bit of coding to do before I had a result, and a whole bunch of test runs on real data. So to avoid the problem outlined above, whenever I ran a test on actual data (as opposed to simulated data), I would not print out the actual results, but results with an unknown (random) number added. So the code would look like this (much simplified) :
So if I got, say, x=3%, very unexpected, I would not be tempted to go look for an error in the code; I’d have no idea whether I genuinely had an unexpected result, or a vanilla result with a big blinding factor.
Note that I used the same random seed every time, so I would have comparable results from run to run; if I changed something and suddenly had 1% in place of 3%, I knew something was up. But I still had no idea whether I had a New-Physics-indicating result or just a confirmation of theory.
To end blinding I had to get permission from a review committee (not my thesis committee, but other workers inside the same experiment); then I commented out the blinding lines and printed the true values. Alas, they were quite consistent with expectations.
There is a coda to this: At my thesis defense, one of the committee asked an excellent question, which I had to answer before I could submit the thesis. In the course of answering it, I did in fact find (to my horror!) a sign error in my code. Fortunately it did not change the final result very much, but it was highly embarrassing.
This approach isn’t suitable to every problem, but we use it wherever we can.
To end blinding I had to get permission from a review committee
Sounds a little bit like a code review. (And, pursuing a theme I’ve raised earlier, a probably effective tactic to leverage collective intelligence against individual bias.)
There is a true result of the run, stored in the variable resultForX. While I’m developing my code, I don’t want to know that true value, because of the surprisingness bias as outlined in the post. I do however want to be able to compare results between test runs. Thus I add a random value, blindValueX, which I do not know; I only know the random seed that produces it. I never print the true result until I’ve finalised the code and done all my testing for systematic errors.
It is, at any rate, quite common with particle physics, although not every analysis uses it. I can’t speak to other fields.
Hmm. I wonder if this would make a top-level post, with some example plots and more in-depth description? Practical methods used in science for avoiding bias, 101.
In particle physics we use blinding of variables, which is a bit distinct from what’s called blind studies in medicine, to deal with this problem. I’ll use my own thesis as an example: I was trying to measure charm mixing parameters, imaginatively denoted x and y. There were some theoretical predictions: they were expected to be less than 1%, x should be smaller in absolute magnitude than y, and they should have opposite sign; in addition there were some prior measurements that we would presumably not conflict with, within the errors.
Now, there was quite a bit of coding to do before I had a result, and a whole bunch of test runs on real data. So to avoid the problem outlined above, whenever I ran a test on actual data (as opposed to simulated data), I would not print out the actual results, but results with an unknown (random) number added. So the code would look like this (much simplified) :
double resultForX = getX();
double errorForX = getXError();
double blindValueX = getRandomUsingSeed(42);
print “Result is ” + (resultForX + blindValueX) + ” plus-minus ” + errorForX;
So if I got, say, x=3%, very unexpected, I would not be tempted to go look for an error in the code; I’d have no idea whether I genuinely had an unexpected result, or a vanilla result with a big blinding factor.
Note that I used the same random seed every time, so I would have comparable results from run to run; if I changed something and suddenly had 1% in place of 3%, I knew something was up. But I still had no idea whether I had a New-Physics-indicating result or just a confirmation of theory.
To end blinding I had to get permission from a review committee (not my thesis committee, but other workers inside the same experiment); then I commented out the blinding lines and printed the true values. Alas, they were quite consistent with expectations.
There is a coda to this: At my thesis defense, one of the committee asked an excellent question, which I had to answer before I could submit the thesis. In the course of answering it, I did in fact find (to my horror!) a sign error in my code. Fortunately it did not change the final result very much, but it was highly embarrassing.
This approach isn’t suitable to every problem, but we use it wherever we can.
Sounds a little bit like a code review. (And, pursuing a theme I’ve raised earlier, a probably effective tactic to leverage collective intelligence against individual bias.)
I don’t understand how this is supposed to help. Or even not hurt.
There is a true result of the run, stored in the variable resultForX. While I’m developing my code, I don’t want to know that true value, because of the surprisingness bias as outlined in the post. I do however want to be able to compare results between test runs. Thus I add a random value, blindValueX, which I do not know; I only know the random seed that produces it. I never print the true result until I’ve finalised the code and done all my testing for systematic errors.
Okay; I see. Is that a common practice? I’d never heard of it before.
It is, at any rate, quite common with particle physics, although not every analysis uses it. I can’t speak to other fields.
Hmm. I wonder if this would make a top-level post, with some example plots and more in-depth description? Practical methods used in science for avoiding bias, 101.