It’s worth noting that the Center for Applied Rationality ran the June minicamp experiment using a standard but unusual statistical method of sorting applicants into pairs that seemed of roughly matched prior ability / prior expected outcome, and then flipping a coin to pick one member of each pair to be admitted or not
As an aside, if you’re interested in looking up more about this nifty experimental design trick, the magic keyword is “blocking”. The idea of randomized block designs dates back to Fisher.
I’ve found blocking to be really useful for my small-scaleexperiments for 2 different reasons:
Often, I’m worried about simple randomization leading to an imbalance in sample vs experimental; if I’m only getting 20 total datapoints on something, then randomization could easily lead to something like 14 control and 6 experimental datapoints—throwing out a lot of statistical power compared to 10 control and 10 experimental. If I pair days, then I know I will get 10⁄10, without worrying about breaking blinding.
Blocking is the natural way to handle multiple-day effects or trends: if I think lithium operates slowly, I will pair entire weeks or months, rather than days and hoping enough experimental and control days form runs which will reveal any trend rather than wash it out in averaging.
As an aside, if you’re interested in looking up more about this nifty experimental design trick, the magic keyword is “blocking”. The idea of randomized block designs dates back to Fisher.
I’ve found blocking to be really useful for my small-scale experiments for 2 different reasons:
Often, I’m worried about simple randomization leading to an imbalance in sample vs experimental; if I’m only getting 20 total datapoints on something, then randomization could easily lead to something like 14 control and 6 experimental datapoints—throwing out a lot of statistical power compared to 10 control and 10 experimental. If I pair days, then I know I will get 10⁄10, without worrying about breaking blinding.
Blocking is the natural way to handle multiple-day effects or trends: if I think lithium operates slowly, I will pair entire weeks or months, rather than days and hoping enough experimental and control days form runs which will reveal any trend rather than wash it out in averaging.